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NEISSEMAL ANTIGENS 

This invention relates to antigens from Neisseria bacteria. 
BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
5 are pathogenic in humans. Nmeningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); Ngonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
1 0 present in all pathogenic meningococci. 

Ngonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, '^Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levme, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.8 17-842). The disease causes significant morbidity but limited mortality. 
15 Vaccination against Ngonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

20 Nmeningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater dxuing outbreaks (see Lieberman 
et al (1996) Safety and Lmnunogenicity of a Serogroups A/C Neisseria meningitidis 
Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAAiA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. NEngl J Med 337(14):970- 

25 976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. MortaUty is extremely high, at 
10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 
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Based on the organism's capsular polysaccharide, 12 serogroups of NMeningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Afiica. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 

5 the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and MortaHty weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

1 0 immune response that cannot be boosted by repeated immimization. Following the success of the 
vaccination against HAnfluenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD ''New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines^ supra, pp. 469-488; Liebemian et 
al (1996) supra\ Costantino et al (1992) Development and phase I clinical testuag of a conjugate 

15 vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach caimot be used because the menB capsular polysaccharide is a polymer 
of a(2-8)-linked AT-acetyl neuraminic acid that is also present in mammalian tissue. This results in 

20 tolerance to the antigen; indeed, if an immune response were eUcited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the 7V-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoom (1994) Current status of Meningococcal group B vaccine 

25 candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
30 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
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porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect Agents Dis, 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg, Ala'Aldeen & Borriello (1996) The meningococcal transferrin-binding proteins 1 
5 and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonococcal genes and 
proteins {eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
10 are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of eflBcacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisseriae, 

THE INVENTION 

15 The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to N.meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
20 homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication of 
fimctional equivalence. Identity between the proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
afOne gap search with parameters gap open penalty^ 12 and gap extension penalty=L 

25 The invention finther provides proteins comprising firagments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, /j is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 
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The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forais (eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
jfree from other Neisserial or host cell proteins) 

5 According to a finther a^ect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
10 sequences disclosed in the examples. 

Furthermore, the mvention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65''C in a O.lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
15 at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, w is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fiagments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
20 complementary to those described above (eg. for antisense or probing piuposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
25 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 
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According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a fiirther aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
5 for instance, or as diagnostic reagents, or as immimogenic compositions. 

The invention also provides nucleic acid, protein, or antibody accordmg to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
10 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as Kgonorrhoeae^ or any strain oiKmeningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
15 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. . 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

20 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means, 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample imder hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

25 A process for detecting proteins of the invention is provided, comprising the steps of: (a) contactmg 
an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 
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A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

5 General 

The practice of the present invention will employ, imless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully m the Uterature eg. Sambrook 
Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes land 

10 a (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & SJ. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S J. Higgins eds. 1984); Animal Cell Culture (R.L Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 
Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 

15 Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice^ 
Second Edition (Springer- Verlag, N. Y.), and Handbook of Experimental Immunology, Volumes 
I-IV (D.M. Weir and C. C. Blackwell eds 1986). 

20 Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patmts, and patent ^plications cited herein are incorporated in fiill by reference. 
In particular, the contents of UK patent ^plications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

Definitions 

25 A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 
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The term "heterologous" refers to two biological components that are not found together in nature. 
The components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
5 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

10 unit of polynucleotide replication within a cell, enable of repUcation under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of rephcation, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 

15 cells. 

A **mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% {eg, 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 

20 Smith- Waterman algorithm as described above). As used herein, an "aUelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locxis in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 

25 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg, see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
30 for example those used with manmialian cells, baculoviruses, plants, bacteria, and yeast. 
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1 Mflmmalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3*) 
transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a 
5 transcription initiating region, which is usually placed proximal to the 5* end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase n to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 
10 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammahan Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd edj, 

Mammahan viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Exan:q)les include 
15 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovkus major late 
promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

20 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis begirming at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 

25 tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 236:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBOl 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

30 [Gorman et al. (1982b) Proc, Natl. Acad, Sci. 79:6777] and from human cytomegalovirus [Boshart 
et al. (1985) Cell ^7:521]. Additionally, some enhancers are regulatable and become active only 
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in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-tenninus of the 
5 recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-termmus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fiision protein comprised of a leader sequence fragment that 
provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
10 sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 
in mammalian cells. 

15 Usually, transcription termination and polyadenylation sequences recognized by mammaUan cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Bimstiel et al. (1985) Cell ^7:349; 
Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic RNA. In 

20 Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
Sci, J4:105]. These sequences dkect the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual], 

25 Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with fimctional spHce donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constmcts are often maintained in a replicon, such as 
an extrachromosomal element {eg. plasmids) capable of stable maintenance in a host, such as 

30 manmiaUan cells or bacteria. Manunalian replication systems include those derived bom animal 
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vinises, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Glu2man (1981) Cell 2J:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
antigen. Additional examples of mammahan replicons include those derived from bovine 
5 papillomavirus and Epstem-Barr virus. Additionally, the repUcon may have two repUcaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaxyotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufinan et al. (1989) MoL Cell. Biol. 9:946] andpHEBO [Shimizu et al. 
(1986) M?/. Cell. Biol. 5:1074], 

10 The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei. 

15 Mammahan cell lines available as hosts for expression are known in the art and include many 
immortaUzed cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (eg. Hep G2), and a 
number of other cell lines. 

20 ii. Baculovirus Svstems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, 
and is operably linked to die control elements within that vector. Vector construction employs 
techniques which are known in the art. Generally, the components of the expression system include 
a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus 

25 genome, and a convenient restriction site for insertion of the heterologous gene or genes to be 
expressed; a wild type baculovirus with a sequence homologous to the baculovims-specific fragment 
in the transfer vector (this allows for the homologous recombination of the heterologous gene in to 
the baculovirus genome); and appropriate insect host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
30 wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
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are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form bora, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fixlly described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
5 and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
10 elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple 
genes, regulated by the same set of regulatory elements. Intemiediate transplacement constructs are 
often maintained in a replicon, such as an extrachromosomal element (eg, plasmids) capable of stable 
maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allovwng 
it to be maintained in a suitable host for cloning and amplification. 

1 5 Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon fi-om ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream firom the ATT; see Luckow and 
Summers, Virology {m9) 77:31. 

20 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol, 42:111) and a prokaryotic ampicillin-resistance {amp) gene and origin of 
replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 

25 (5* to 3') transcription of a coding sequence (eg. structural gene) into roRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymCTase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 

30 regulated or constitutive. 
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Stnictural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
usefiil promoter sequences. Examples include sequences derived fix)m the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology ofBaculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
5 476; and the gene encoding the plO protein, Vlak et al., (1988), J, Gen, Virol 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteoljrtic cleavage, and phosphorylation) ^pear to be recognized by 

10 insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
those derived from genes encoding human a-interferon, Maeda et al., (1985), Nature 375:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell. Biol. 5:3129; 
human IL-2, Smith et al., (1985) Proc. Natl Acad, Sci. USA, 52:8404; mouse IL-3, (Miyajuna et 

15 al., (1987) Gene 55:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
be used to provide for secretion in insects, 

A recombinant polypeptide or polyprotein may be expressed intracellular^ or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
20 containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fiision protein comprised 
25 of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
30 vector and the genomic DNA of wild type baculovirus - usually by co-transfection. The promoter 
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and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Sunmiers and Smith supra\ Ju et al. (1987); Smith et 
al., MoL Cell. Biol (1983) 5:2156; and Luckow and Summers (1989)). For example, the insertion 
5 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays '^:91.The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5* and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

10 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequaicy (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 

15 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 (im in size, are highly refractile, 
giving them a bright shiny appearance that is readily visualized under the hght microscope. Cells 
infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus fi^om 

20 wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra\ Miller et al. (1989). 

25 Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa califomica, Bombyx moriy Drosophila melanogaster^ Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) /. Virol 5(5:153; Wright (1986) Nature 
321:in\ Smith et al., (1983) Mol Cell Biol 5:2156; and see generally, Fraser, et al (1989) In 

30 Vitro Cell Dev, Biol 25:225). 
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Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
5 stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product 
gene is under inducible control, the host may be grown to high density, and expression induced. 
Alternatively, where expression is constitutive, the product will be continuously expressed into the 
medium and the nutrient medium must be continuously circulated, \\diile removing the product of 
interest and augmenting depleted nutrients. The product may be purified by such techniques as 
10 chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, etc.; 
electrophoresis; density gradient centrifiigation; solvent extraction, or the like. As appropriate, the 
product may be fiirther purified, as required, so as to remove substantially any insect proteins which 
are also secreted in the medium or resuh fi-om lysis of insect cells, so as to provide a product which 
is at least substantially firee of host debris, eg. proteins, lipids and polysaccharides. 

15 In order to obtain protein expression, recombinant host cells derived 6com the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 
iii. Plant Svstems 

20 There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
. US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 

25 Vaulcombe et al., Mol Gen, Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, 1 Biol Chem. 260:3731-3738 (1985); Rothstein et al.. Gene 55:353-356 
(1987); Whittier et al.. Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al.. Molecular 
Microbiology 3:3-14 (1989); Yu et al.. Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberelhc acid and secreted enzymes induced by 

30 gibberelhc acid can be found in R.L. Jones and J. MacMillin, Gibberellins: m: Advanced Plant 
Physiology,. Malcohn B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
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References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proc. Nail Acad, ScL 
84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
5 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 

1 0 desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 

15 general review of suitable markers, for example for the members of the grass family, is found in 
Wihnink and Dons, 1993, Plant MoL Biol Reptr, 11(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
20 cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
25 although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5* 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence mcoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as ^propriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 

5 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efiBciently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 

10 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
out as introns by the host's splicosorae machinery. If so, site-directed mutagenesis of the "intron" 
1 5 region may be conducted to prevent losing a portion of the genetic message as a false intron code. 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 

20 Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al.. Nature, 327, 70-73, 1987 and Knudsen and 
MuUer, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 

25 entities, either minicells, cells, lysosomes or other fiisible lipid-surfaced bodies, Fraley, et al., Proa 
Natl Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad, ScL USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
30 reversibly permeabiUze biomembranes allowing the mtroduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
5 other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solatium, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
10 Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transform^ protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequaitly rooted. Alternatively, embryo 

15 formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as com and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 

20 history of the culture. If these three variables are controlled, then regeneration is fiilly reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
25 embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
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iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 
a coding sequence {eg. structural gene) into mRNA. A promoter will have a transcription initiation 

5 region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 

10 thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 

15 coli) [Raibaud et al (1984) Annu. Rev. Genet 75:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose {lac) [Chang et al (1977) Nature 7P5:1056], and maltose. Additional examples include 

20 promoter sequences derived from biosynthetic enzymes such as tryptophan {trp) [Goeddel et al 
(1980) Nuc. Acids Res. «:4057; Yelverton et al (1981) Nucl Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase {bla) promoter system 
[Weissmann (1981) "The cloning of interferon and other mistakes." In Interferon 3 (ed, I. Gresser)], 
bacteriophage lambda PL [Shimatake et al (1981) Nature 292:128] and T5 [US patent 4,689,406] 

25 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also fimction as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
30 promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al (1983) Gene 25:167; de Boer et al (1983) Proc. Natl Acad ScL 50:21]. 
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Furthermore, a bacterial promote- can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
5 polymerase/promoter system is an example of a coupled promoter system [Studier et al (1986) J, 
Mol Biol 189\\\Z\ Tabor et al (1985) Proc Natl Acad, Set 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO 
A-0 267 851). 

In addition to a fimctioning promoter sequence, an efiBcient ribosome binding site is also useful for 
10 the expression of foreign genes in prokaryotes. In E, colU the ribosome binding site is called the 
Shine-Dalgamo (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al (1975) 
Nature 254'3A\ The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3* and of £. coli 16S rRNA [Steitz et al (1979) 
15 "Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaiyotic genes with weak ribosome-biiiding site [Sambrook et al (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual], 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
20 with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved ftom the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
25 N-terminal portion of an endogenous bacterial protein, or other stable protein, is fiised to the 5* end 
of heterologous coding sequences. Upon expression, this constmct will provide a fiision of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fiision protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein &om the foreign gene 
30 [Nagai et aL (1984) Nature 309:810]. Fusion proteins can also be made with sequences fix>m the 
lacZ [Jia et al (1987) Gene 50:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff al 
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(1989) /. Gen, Microbiol. 735:1 1], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fiision protein. Such a fiision protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme {eg, ubiquitin specific processing-protease) to cleave the 
5 ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[MiUer et al. (1989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fiision protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
10 a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the 
cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic 
space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably 
there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal 
peptide fragment and the foreign gene. 

15 DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins,, 
such as the E. coli outer membrane protein gene (ompA) [Masui et aL (1983), in: Experimental 
Manipulation of Gene Expression', Ghrayeb et al (1984) EMBOJ. i:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al (1985) Proc, Natl Acad, Set 52:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 

20 can be used to secrete heterologous proteins from 5. subtilis [Palva et al (1982) Proc, Natl Acad, 
ScL USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3' to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
25 encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop stmctures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E, coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
30 coding sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
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element {eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a rephcation system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 
5 fix)m about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
nimiber plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations ^pear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 5 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
20 Pavies e( aL (1978) Annu. Rev. Microbiol. 52:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal repUcons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. 
Natl Acad. ScL USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et aL (1981) Nature 292:128; Amann et aL (1985) Gene ^0:183; Studier et aL 

30 (1986) J. MoL BioL 759:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
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Streptococcus cremoris [Powell et al (1988) Appl Environ, Microbiol 5^:655]; Streptococcus 
lividans [Powell et al (1988) Appl Environ, Microbiol 5^:655], Streptomyces lividans [US patent 
4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 

5 include either the transformation of bacteria treated with CaClj or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg, 
[Masson et al (1989) FEME Microbiol Lett, 60:273; Palva et al (1982) Proc, Natl Acad, Sci. USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al (1988) 

10 Proc. Natl Acad. Sci. 55:856; Wang et al (1990) J. Bacteriol 1 72:949, Campylobacter], [Cohen 
et al (1973) Proc. Natl Acad. ScL 6P:2110; Dower et al (1988) Nucleic Acids Res. 16:6121; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl-derived 
plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) J. Mol Biol 55:159; Taketo 

15 {198S) Biochim. Biophys. Acta 949:3\S; Escherichia], [Chassyetal (1987) FEMS Microbiol Lett. 
44:113 Lactobacillus]; [Fiedler al (1988) ^/la/. Biochem 770:38, Pseudomonas]; [Augustine/ 
al. (1990) FEMS Microbiol Lett. 66:203, Staphylococcus], [Barany et al (1980) J. Bacteriol 
144:69%; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, m: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss HI); Perry et al (1981) Infect. Immun. 

20 52:1295; Powell et al (1988) Appl Environ. Microbiol 54:655; Somkuti et al (1987) Proc. 4th 
Evr, Cong. Biotechnology 7:412, Streptococcus]. 
V. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3') 

25 transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a 
transcription mitiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 
Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 

30 The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
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of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
5 include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexoldnase, 
phosphofructoldnase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara et al (1983) Proc, Natl Acad, Set USA 50:1]. 

10 In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 

15 which consist of the regulatory sequences of either the ADH2, GAL4, GALIO, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occuiring promoters 
of non-yeast origin that have the abiUty to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al (1980) Proc. Natl Acad, Sci. USA 

20 77:1078; Henikofif al (1981) Nature 253:835; Hollenberg et al (1981) Curr. Topics Microbiol 
Immunol 96:119; Hollenberg et al (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al (1980) Gene 
77:163; Panthiere/ a/. (1980) Curr, Genet 2:109;]. 

25 A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 
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Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequraice encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fiision of the two 

5 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5* terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 

10 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 

Altematively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fiision protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
15 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
20 patrait 4,588,684). Altematively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
25 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 
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Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
5 termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a repUcon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 

10 rq)hcon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 5: 17-24], pCl/1 [Brake et al 
(1984) Proc. Natl Acad. Sci USA 57:4642-4646], and YRpl7 [Stinchcomb et al, (1982) J. Mol 
Biol 755:157]. In addition, a replicon may be either a high or low copy number plasmid. A high 

1 5 copy number plasmid will generally have a copy nxunber ranging fix)m about 5 to about 200, and 
usually about 1 0 to about 1 50. A host containing a high copy nxmiber plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al., supra. 

20 Altematively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al (1983) Methods in 

25 Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr-Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al (1983) Proc. Natl Acad. Sci. USA 50:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

30 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
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chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
5 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, 
TRPly and ALGl^ and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the abihty to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUPl allows yeast to grow in the presence of copper ions [Butt et aL (1987) Microbiol 
10 Rev, J/:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
15 have been developed for transformation into many yeasts. For example, expression vectors have 
been developed for, inter alia, the following yeastsiCandida albicans [Kurtz, et aL (1986) MoL 
Cell. Biol. d:142], Candida maltosa [Kunze, et aL (1985) J. Basic MicrobioL 25:141]. Hansenula 
polymorpha [Gleeson, et aL (1986) J. Gen. MicrobioL 732:3459; Roggenkamp et aL (1986) MoL 
Gen. Genet. 202:302], Kluyveromyces fragiUs [Das, et aL (1984) J. BacterioL 755:1165], 
20 Kluyveromyces lactis [De Louvencourt et aL (1983) J. BacterioL 154:737; Van den Berg et aL 
(1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et aL (1985) /. Basic MicrobioL 
25:141], Pichia pastoris [Cregg, et aL (1985) MoL CelL BioL 5:3376; US Patent Nos. 4,837,148 
and 4,929,555], Saccharomyces cerevisiae [Hinnen et aL (1978) Proc. NatL Acad. Scl USA 
75:1929; Ito et aL (1983) J. BacterioL 755:163], Schizosaccharomyces pombe [Beach and Nurse 
25 (1981) Nature 300:706], and Yarrowia lipolytica [Davidow, et aL (1985) Curr. Genet. 70:380471 
Gaillardin, etaL (1985) Curr. Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usuaUy 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 
Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
30 et aL (1986) MoL CelL Biol 6:142; Kunze et aL (1985) J. Basic MicrobioL 25:141; Candida]; 
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[Gleeson et al (1986) X Gen, Microbiol 752:3459; Roggenkamp et al (1986) Mol Gen, Genet. 
202301\ Hansenula]; [Das et al (1984) 1 Bacteriol 755:1 165; De Louvencourt et al (1983) /. 
Bactenol 754:1 165; Van den Berg et al (1990) Bio/Technology 5:135; Kluyveromyces]; [Cregg 
et al (1985) Mol Cell Biol 5:3376; Kimze et al (1985) 1 Basic Microbiol 25:141; US Patent 
5 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al (1978) Proa Natl Acad. ScL USA 75;1929; 
Ito et al (1983) 1 Bacteriol 755:163 Saccharomyces]; [Beach and Nurse (1981) Nature 500:706; 
Schizosaccharomyces]; [Davidow etal (1985) Curr. Genet. 70:39; Gaillardin etal (1985) Cwrr. 
Gene/. 70:49; Yairowia]. 

Antibodies 

10 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site'' is the three-dimensional 
binding space with an internal surface sh^e and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

15 antibodies, altered antibodies, imivalent antibodies. Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods, hi general, the protein is first used to immimize a suitable animal, preferably 

20 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availabiUty of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parentarally (generally subcutaneously or intramuscularly). A dose of 50-200 jig^jection 

25 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immimization using methods known in the art, which 
for the purposes of this invention is considered eqmvalent to in vivo immimization. Polyclonal 
antisera is obtained by bleeding the inmiunized animal into a glass or plastic contains, incubating 

30 the blood at 25°C for one hour, followed by incubating at 4*^C for 2-18 hours. The serum is 
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recovered by centrifiigation {eg. IfiQOg for 10 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof Typically, a mouse or rat is immunized as described 

5 above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 

10 the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fiise with 
myeloma cells to form hybridomas, and are cultured in a selective mediimi {eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bmd to unrelated antigens). The selected MAb-secreting hybridomas are then 

15 cultured either in vitro {eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 

32 

techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly P 
and electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 

20 are typically detected by their activity. For example, horseradish peroxidase is usually detected by its 
ability to convert 3,3\5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner** refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and a monoclonal antibody specific 
therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, 

25 and the numerous receptor-ligand couples known in the art. It should be xmderstood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, '^^I may serve as a radioactive label or as an 
electron-dense reagent. HRP may serve as aizyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect For example, MAbs and avidin also require labels in the practice of 

30 this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled 
with *^I, or with an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be 
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readily apparent to those of ordinary skill in the art, and are considered as equivalents within the scx>pe 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
5 invention. The phannaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term **therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, amehorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 

10 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amoimt for a given situation can be determined by routine 

15 experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg^g 
or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 

A phannaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such 

20 as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. Suitable 
carriers may be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus 

25 particles. Such carriers are well known to those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
30 Pub.Co.,N.J. 1991). 
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Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol. Additionally, auxiUary substances, such as wetting or emulsifying agents, 
pH buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic 
compositions are prepared as injectables, either as liquid solutions or suspensions; solid fonns suitable 
5 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. Liposomes 
are included within the definition of a pharmaceutically acceptable carrier. 

Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

10 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pubnonary administration, suppositories, and transdermal or 
transcutaneous ^plications {eg, see WO98/20734), needles, and gene guns or hyposprays. Dosage 

1 5 treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic {ie, to prevent infection) or 
therapeutic {ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
20 usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
25 known to those of ordinary skill in the art. Additionally, these carriers may fiinction as 
immunostimulating agents ("adjuvants"). Furtheraiore, the antigen or immunogen may be conjugated to 
a bacterial toxoid, such as a toxoid firom diphtheria, tetanus, cholera, H. pylori^ etc, pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not Umited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
30 (2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 
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such as miiramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 1 0 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
5 formulated into submicron particles using a microfluidizer such as Model HOY microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 

10 cell wall components from the group consisting of monophosphoiylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 
adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immimostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IFA); (5) cytokines, such as interleukins {eg. 

15 IL-1, IL-2, IL-4, IL-5, lL-6, IL-7, n.-12, e/c), interferons {eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances ttiat 
act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
20 threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylrauramyl-L-alanyl-D-isoglutaniinyl-L-alariine-2-(r-2*-dipalirutoyl-<s/i- 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions {eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such 
25 as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, Uquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
30 enhanced adjuvant effect, as discussed above under pharmaceutically acceptable carriers. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 

5 prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxononfiic group of individual to be treated (eg. nonhuman primate, primate, e/c), 
the edacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 

10 can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdemaal appUcations. Dosage treatment may be a single dose 
1 5 schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 
immunoregulatory agents. 

As an alternative to protem-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

20 Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be deUvef ed to the mammal for expression in the manmial, can be administered 
either locally or systemically . These constructs can utilize viral or non-viral vector approaches in 
in vivo or ex vivo modaUty. Expression of such coding sequence can be induced using endogenous 

25 mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adaio-associated viral (AAV), herpes viral, or alphavirus vector The viral vector can 
30 also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
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picomavinis, poxvirus, or togavinis viral vector. See generally. Jolly (1994) Cancer Gene Therapy 
1:51-64; Kimuia (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153, 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene tharapy vector 
5 is employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for 
example, NZB-X1,NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) polytropic retroviruses 
eg, MCF and MCF-MLV (see Kelly (1983) J. Virol 45:291), spumaviruses and lentivimses. See RNA 
Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
10 example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
15 5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
20 producer cell lines (also termed vector cell lines or ''VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made fix)m human parent cells {eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 

Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
25 Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1 504A (Hartley and Rowe (1 976) J Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Grafii, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
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collections such as the American Type Culture Collection ("ATCC'*) in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

9 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
5 WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5^19,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
10 79:729-735; Mann (1983) Ce// 33:153; Cane {mA) Proc Natl Acad Sci 81:6349; and MiUer (1990) 
Human Gene Therapy 1. 

Human adenoviral gene ther^y vectors are also known in the art and employable in this invention. 
See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WQ9?>IQn2&X WO93/06223, and WO93/07282. Exemplary known adenoviral gene ther^y vectors 

15 employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 

20 Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 

25 which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 1 8 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 

30 terminal repeat {ie, there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
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native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J, Virol 61 :3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 
5 Piatent 5,478,745. Still other vectors are those disclosed m Carter US Patent 4,797,368 and 
Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 
further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albimiin promoter and directs e3q)ression predominantly in the hver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 
10 Additional AAV gene th^py vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5^52,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
15 exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241 : 1 667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 7134, 2 RH 105 and GAM described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

20 Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091^09, 5,217,879, and 

25 WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
ffled March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using conmionly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 

30 08/679640), 
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DNA vector systems such as eukarytic layered expression systems are also liseful for expressing 
the nucleic adds of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

5 Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J, Biol 
Standardization 1:115; rhinoyirus, for example ATCC VR-1 1 10 and those described in Arnold (1990) 
/ Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC 
VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Set 86:317; 

10 Flexner (1989) Ann NYAcadSci 569:86, Plexner (1990) Vaccine 8:17; in US 4,603,112 and US 
4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those described in 
Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza virus, for 
example ATCC VR-797 and recombinant influenza viruses made employmg reverse genetics 
techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; 

15 Enami & Palese (1991) 7 Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see also McMichael 
(1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher(1992) J. Virol 66:2731; 
measles virus, for example ATCC VR-67 and VR-1247 and those described in EP-0440219; Aura 
virus, for example ATCC VR-368; Bebaru vims, for example ATCC VR-600 and ATCC VR-1240; 

20 Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1241 ; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC 
VR-369 and ATCC VR-1243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for 
example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu 
vims, forexample ATCC VR-371;Pbcuna virus, for example ATCC VR-372 and ATCC VR-1245; 

25 Tonate virus, for example ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for 
example ATCC VR-374; AVhataroa virus, for example ATCC VR-926; Y-62-33 virus, for example 
ATCC VR-375; O^Nyong virus, Eastem encephalitis vims, for example ATCC VR-65 and ATCC 
VR-1242; Western encephalitis virus, for example ATCC VR-70, ATCC VR-125 1, ATCC VR-622 
and ATCC VR-1 252; and coronavirus, for example ATCC VR-740 and those described in Hamre 

30 (1966) Proc Soc Exp Biol Med 121 : 190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
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expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, ffled Decanber 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
5 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in W092/1 1033, nucleic charge neutralization or fiision with cell 
membranes. Additional ^preaches are described in Philip (1994) Mol Cell Biol 14:241 1-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

10 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, Unked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) Biol Chem. 

15 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the 
20 beads. The method may be improved fiirther by treatment of the beads to increase hydrophobicity and 
thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 

25 vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting hgands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferria Otho" delivery systems include the use of Uposomes to encapsulate 
DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 

30 promoters. Further non-viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. USA 
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91(24):11581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other convmtional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600: 1 ; Bayer (1 979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
149: 1 1 9; Wang (1 987) Proc Natl Acad Sci 84:7851 ; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be firom about 0,01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. 

15 Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex Wvc?, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be marrrnials or birds. Also, human subjects 
can be treated. 

20 Direct dehvery of the compositions will generally be accomplished by injection, either 
subcutaneously, intr^eritoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered rato a lesion. Other modes of 
administration include oral and puhnonary administration, suppositories, and transdermal or 
transcutaneous apphcations {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

25 treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg, W093/14778. Examples of cells useful in ex vivo ^plications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 



5 



10 



wo 99/24578 PCT/1B98/01665 

-39- 

Generally, delivery of nucleic acids for both ex vrv(> and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, enc^sulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
5 known in the art. 

Polynucleotide and yolyveptide vharmaceutical compositions 

In addition to the pharmaceutically acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

10 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 

15 other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 

B. Hormones. Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 
20 CPolvalkvlenes, Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodimrat, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycoUde) 

25 D.Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in Upids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to Upid preparation can vary 
30 but will generally be around 1:1 (mg DNA:micromoles lipid), or more of Upid. For a review of the 
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use of liposomes as carriers for delivery of nucleic acids, see. Hug and Sleight (1991) Biochim, 
Biophys. Acta. 1097:1-17; Straubinger (1983) Metk EnzymoL 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
5 mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl Acad. Set USA 
84:7413-7416); mRNA (Malone (1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990) J. Biol Chem. 265:10189-10192), in functional fonn. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium 
(DOTMA) liposomes are available under the trademark Lipofectin, fix)m GIBCO BRL, Grand 
10 Island, NY. (See, also. Feigner supra). Other commercially available hposomes include 
transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proa Natl Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP ( 1 ,2-bis(oleoy loxy)-3-(trimethylammonio)propane) liposomes. 

15 Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Bimiingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. 
These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate 

20 ratios. Methods for making liposomes using these materials are well known in the art 

The liposomes can comprise multilammelar vesicles (MLVs), small imilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol 101 :512-527; Szoka 
(1978) Proc. Natl Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
25 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim, Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl Acad. Set USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl Acad. Sci USA 76:145; Fraley (1980) / Biol 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl Acad. Sci USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 
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E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, 
fragments, or fusions of these proteins can also be used. Also, modifications of naturally occurring 
5 lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the delivery of 
polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with 
the polynucleotide to be delivered, no other targeting ligand is included m the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
10 identified. At least two of these contain several proteins, designated by Roman nimierals, AI, AU, 

ATv^; CI, cn, cm. 

A lipoprotein can comprise more than one qjoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
E apoproteins. VLDL comprises A, B, C, and E ^oproteins, LDL comprises apoprotein B; and 
15 HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
261:12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
20 phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
naturally occurring lipoproteins can be found, for example, in Meth. EnzymoL 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
25 association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isoli^ted from serum by ultracentrifugation, for instance. 
Such methods are described in Meth, EnzymoL {supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
30 Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
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443. Lipoproteins can also be purchased jfrom commercial suppliers, such as Biomedical 
Techniologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/ 14465. 
F.Polvcationic Agents 

5 Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
enable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
10 be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyomithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such 
as PC174, transcriptional factors also contain domains that bind DNA and therefore may be useful 
15 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIDD contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
20 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

25 Immunodiamostic Assays 

Neisserial antigens of the invention can be used in immimoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to rqjlace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 
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samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immimoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
5 or polypeptide; the labels may be, for example, fluorescent, chemilxmiinescent, radioactive, or dye 
molecules. Assays which ampUfy the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immimoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the ^propriate labeled reagents are constructed 
10 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, eta) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
1 5 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in solution. 
Then, the two sequences will be placed in contact with one another under conditions that favor 
hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment of the Uquid 
phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentmtion of the sequences; 
20 use of compounds to increase the rate of association of sequences (dextran sulfate or polyethylene 
glycol); and the stringency of the washing conditions following hybridization. See Sambrook et al 
\suprd\ Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
25 concentration should be chosen that is approximately 120 to 200*^0 below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of gmomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al at page 9.50. 

30 Variables to consider when performing, for example, a Southem blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequaices being detected. The 



wo 99/24578 PCT/IB98/01665 

-44- 

total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to Ijig for a 
plasmid or phage digest to 10'^ to 10"* g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
5 can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 jig of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10* cpm/^g. For a single-copy mammalian gene a conservative ^proach would start 
Avith 10 ^g of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sxilfate 
using a probe of greater than 10* cpm/fig, resulting in an exposure time of -24 hours. 

1 0 Several factors can affect the melting temp^ature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other conmionly 
encountered variables include the length and total GH-C content of the hybridizing sequences and 
the ionic strength and formamide content of the hybridization buffer. The effects of all of these 

15 factors can be approximated by a single equation: 

Tm= 81 + 16.6(log,oCi) + 0.4[%(G + C)]-0.6(%formamide) - 600/w-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(sUghtly modified from Meinkoth & Wahl (1984) Anal Biochem, 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
20 conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (/e. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
25 interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of backgroimd in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42**C for 
30 a probe with is 95% to 1 00% homologous to the target firagment, 3TC for 90% to 95% homology. 
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and 32*^0 for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 
wash conditions which are nonstringent. If non-specific bands or high backgroimd are observed 
5 after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PGR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
1 0 probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Ndsserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many diffa-ent nucleotide sequences will 
15 encode the amino acid sequence, the native Neisseria] sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
20 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpfiil as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
25 sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
30 temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
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complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

5 Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al 
[1 Am, Chem, Soc, (1981) 103:3185], or according to Urdea et al [Proc, Natl Acad ScL USA 
(1983) 80: 7461], or using conunercially available automated oHgonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain apphcations, 
DNA or RNA are appropriate. For other apphcations, modifications may be incorporated eg, 
1 0 backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg, see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agmwal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg, see Corey (1997) TIBTECH 15:224-229; Buchardt et 
al (1993) r7Br£:Cff 11:384-386]. 

15 Alternatively, the polymerase chain reaction (PGR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al [Meth, Enzymol 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
that does not hybridize to the sequence of the ampUfication target (or its complement) to aid with 

20 duplex stabihty or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids firom the primers using the 
original target nucleic acids as a template. After a threshold amoimt of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as Southmi 
25 blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra], mRNA, or cDNA genaated fiom mRNA using a polymerase enzyme, can be purified 
and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
30 support, such as nitrocellulose. The sohd support is exposed to a labelled probe and then washed 
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to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
5 ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight maricers. Arrows indicate the position of the main recombinant product or, 
in Western blots, the position of the main N.meningitidis immunoreactive band. TP indicates 
Kmeningitidis total protein extract; OMV indicates N.meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 

10 shows GST control data; a circle (•) shows data with recombinant Kmeningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPHI analysis Oower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al (1989) J. Immunol 143:3007; Roberts et al (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
aL (1992) Sccmd J Immunol suppl.1 1 :9) and is available in the Protean package of DNASTAR, Inc. 

15 (1228 South Park Street, Madison, Wisconsin 5371 5 USA). 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in Kmeningitidis^ along 
with their putative translation products, and also those of Kgonorrhoeae, Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

20 The examples are generally in the following format: 

• a nucleotide sequence which has been identified in Kmeningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in Kmeningitidis (strain A) and in 
25 Kgonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 
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The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and fimction, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein fimction to a new 
5 sequence and has proved particularly usefiil in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih,gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
10 following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdatef PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemmted at http://www.genome.ou.edu/gono_blast.html. The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

1 5 Dots within nucleotide sequmces {eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading firame. In the same way, double- 
underlined nucleotides were removed. Lower case letters {eg, position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 

20 experiments). 

Nucleotide sequences were scanned in all six reading firames to predict the presence of hydrophobic 
domains using an algorithm based on the statistical studies of Esposti et al [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
rq)resent potential transmembrane regions or hydrophobic leader sequences. 

25 Open reading firames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER(NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
domams were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 
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Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an inMnxme response to the protein in question ie. the protein is an 
5 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg, in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial siuface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N.meningitidis strain 2996 was grown to exponential phase in lOQml of GC medium, harvested by 
centrifiigation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

15 AAer 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50|ag/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hoiurs. Two phenol extractions (equilibrated to pH 8) and one ChClj/isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifiigation. The pellet was washed once with 70% 

20 ethanol and redissolved m 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers wctc designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
25 sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5 '-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoKL-Nhel, depending on the gene's own restriction pattern); the 3' primers included 
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a Xhol restriction site. This procedure was established in order to direct the cloning of each 
amplification product (corresponding to each ORF) into two different expression systems: pGEX-KG 
(using either BamHl-Xhol or EcoEl-XhoI), and pET21b+ (using either NdehXhol or Nhel-Xhol), 

5'-end primer tail: CGCGGATCCCATATG (BamHl-Ndel) 

5 CGCGGATCCGCTAGC (BamHl-Nhel) 

CCG GAATTC T AGCTAGC (EcoRl-Nhel) 

3'-end primer tail: CCCG CTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
10 the same 3 ' Xhol primer was used as before: 

5'-end primer tail: GGAATTCCATATGGCCATGG (Ndel) 

5'-end primer tail: CG GGATCC (BamHT) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terrainus His-tag 
fiision. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
1 5 BamHl restriction sites were incorporated using primers: 

5'-end primer tail: GATC AGCTAGC CATATG (Nhel) 

3'-end primer tail: CG GGATCC (BamHl) 

As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
20 nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 

Tn. = 4 (G+C)+ 2 (A+T) (tail excluded) 

T„- 64.9 + 0.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70^C for the whole oligo and 
25 50-55°C for the hybridising region alone. 

Table I (page 487) shows the forward and reverse primers used for each amplification. In certain 
cases, it will be noted that the sequence of the primer does not exactly match the sequence in the 
ORF. When initial amplifications were performed, the complete 5' and/or 3' sequence was not 
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known for some meningococcal ORFs, although the corresponding sequences had been identified 
in gonococcus. For ampUficatioh, the gonococcal sequences could thus be used as the basis for 
primer design, altered to take account of codon preference, hi particular, the following codons were 
changed: ATA-^ATT; TCG^TCT; CAG-^CAA; AAG->AAA; GAG->GAA; CGA-^CGC; 
5 CGG->CGC; GGG-^GGC. Italicised nucleotides in Table I indicate such a change. It will be 
appreciated that, once the complete sequence has been identified, this approach is generally no 
longer necessary. 

Oligos were synthesized by a Perldn Ehner 394 DNA/RNA Synthesizer, eluted fiom the columns 
in 2ml NH4OH, and deprotected by 5 hours incubation at 56°C. The oUgos were precipitated by 
10 addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifiiged and the 
pellets resuspended in either 100^1 or 1ml of water. ODj^o was determined using a Perkin Ehner 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-lOpmol/^l. 

C) Amplification 

The standard PGR protocol was as follows: 50-200ng of genomic DNA were used as a template 
15 in the presence of 20-40^M of each oUgo, 400-800^M dNTPs solution, Ix PGR buffer (mcluding 
1.5mM MgCy, 2.5 units TaqI DNA polymerase (using Peridn-Ebner AmpHTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PGR was optimsed by the addition of 10|J.1 DMSO or 50^1 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
20 at 95^C), each sample underwent a double-step amphfication: the first 5 cycles were performed 
using as the hybridization temperature the one of the oHgos excluding the restriction enzymes tail, 
followed by 30 cycles perfomied according to the hybridization temperature of the whole length 
oUgos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last. 30 cycles 


30 seconds 


30 seconds 


30-60 seconds 
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65-70°C 


72°C 



The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Ebner GeneAmp PCR 
System. To check the results, 1/10 of the ampUfication volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

5 The amplified DNA was either loaded directly on a 1 % agarose gel or first precipitated with ethanol 
and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA Augment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volimie of the DNA 
firagment was 30fil or 50|il of either water or lOmM Tris, pH 8.5. 

1 0 D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with; 

- NdeVXhol or NheVXhol for cloning into pET-21b+ and fiirther expression of the protein 
as a C-terminus His-tag fiision 

1 5 - BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and finther expression of the 

protein as N-terminus GST fiision. 

- For ORF 76, NheVBamHl for cloning into pTRC-HisA vector and fiirther expression 
of the protein as N-terminus His-tag fiision. 

- EcoRI/Pstl EcoRI/Sall Sall/PstI for cloning into pGex-His and fiirther expression of 
20 the protein as N-terminus His-tag fiision 

Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40^1 final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
25 50p,l of either water or lOmM Tris-HCI, pH 8.5. The final DNA concentration was determined by 
1% agarose gel electrophoresis in the presence of titrated molecular weight maricer. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lOfig plasmid was double-digested with 50 units of each restriction enzyme in 200|il reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 firom the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50^1 of 
IQmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD^^ of the sample, 
and adjusted to 50)ig/nl. l|il of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
10 vector pTRC99 (Pharmacia). 

F) Cloning 

The fragments conresponding to each ORE, previously digested and purified, were ligated in both pET22b 
and pGEX-KG. In a final volume of 20jxl, a molar ratio of 3:1 fragment/vector was Ugated usmg 0.5|il 
of NEB T4 DNA ligase (400 xmits/jil), in the presence of the buffer supplied by the manufacturer. 
15 The reaction was incubated at room temperature for 3 hours. In some experiments, Ugation was 
performed using the Boheringer *TUpid Ligation Kit", following the manufacturer's instructions. 

In order to introduce the recombinant plasmid in a suitable strain, 100^1 E, coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
minutes, then, after adding 800^1 LB broth, again at ^TC for 20 minutes. The cells were then 
20 centrifiiged at maximum speed in an Eppendorf microfiige and resuspended in approximately 200)al 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + lOOjig/ml 
ampiciUin. The cells were then pelletted and the DNA extracted using the C^agen QIAprep Spin 
25 Miniprep Kit, following the manufacturer's instructions, to a final volume of 30^1. 5^ll of each 
individual miniprep (^^proximately Ig ) were digested with either NdeVXhol or BamHVXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 
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For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PGR 
product was ligated into double-digested vector using EcoRL-Pstl cloning sites or, for ORFs 1 1 5 
& 127, EcoRl-SaR or, for ORF 122, SaH-Pstl, After cloning, the recombinant plasmids were 
introduced in the E.coli host W31 10. Individual clones were grown overnight at 37°C in L-broth 
5 with 50fil/ml ampicilUn. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product, l^il of each construct was used to transform 30^1 E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 

10 above. In the case of the pGEX-His vector, the same Ecoli strain (W3110) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(lOO^ig/ml), incubated at 37''C overnight, then diluted 1:30 in 20ml of LB+Amp (lOOjig/ml) in 
lOOml flasks, making sure that the OD^oo ranged between 0. 1 and 0. 1 5. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 

15 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0,8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifiiged in a microfiige, 

20 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifiiged at 6000g and the pellet resuspended in PBS for fiirther use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 3TC on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 

25 diluted 1 :30 into 600ml of firesh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hoiu^ 
incubation. The culture was centrifiiged at SOOOrpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, fi-ozen and thawed twice and centrifiiged again. 

30 The supernatant was collected and mixed with 150|il Glutatione-Sepharose 4B resin (Pharmacia) 
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(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at lOOg for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached ODjgo of 0.02-0.06. The GST-fusion 
5 protein was eluted by addition of 700^1 cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the ODjgo was 0. 1 . 2 1 jil of each fi^ction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 1 16.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 
10 be added to the MW of each GST-fusion protein. 

I) His-fusion solubiUty analysis (ORFs 111-129) 

To analyse the solubility of the His-fiision expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500(il PBS pH 7.2]. 25jil lyso2yme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4**C. The pellets were sonicated for 30 sec at 40W using a 

15 Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifiigation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and O.IM NaHj POJ and incubated for 3 to 
4 hours at 4°C. After centrifiigation, the supernatant was collected and the pellet was resuspended 
in buffo- M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and O.IM NaH2P04] overnight 

20 at 4**C. The supematants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 1 13, 1 19 and 120 were foimd to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

25 A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37*^0) to OD550 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture fiirther incubated for three hours. The culture was centrifuged at SOOOrpm at 4°C, 

30 the supematant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
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bufifer A (300mM NaCl, 5QmM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40 W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

5 For insoluble proteins, the supernatant was stored at -20**C, while the pellets were resuspended in 2ml 
buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) and 
treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 minutes. 

Supematants were collected and mixed with 150(il Ni^^-resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
10 for 30 minutes. The sample was centrifuged at 700^ for 5 minutes at 4^C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached ODjgo of 0.02-0.06. 

The resin was washed with either (i) 2nal cold 20mM imidazole buffer (300mM NaCl, 50mM 
15 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D280 of 0.02-0.06. The His-fiision 
protein was eluted by addition of 700nl of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D280 was 0.1. 21|il of each 
20 fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20|xg/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4'*C for 12- 
25 14 hours. The protein was further dialysed against dialysis buffer n (10% glycerol, 0.5M arginine, 
50raM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4**C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1.55 x OD^J - (0.76 x OD^J 
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L) His-fusioD large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
5 on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two colimm volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20^g of each purified protein were used to immunise mice intraperitoneally . In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH)3 as adjuvant on 
days 1,21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CDl mice were immxmised using the same protocol. For ORFs 25 and 40, CDl mice 
15 were immunised using Freund's adjuvant, rather than AL(OH)3, and the same inmiiuiisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CDl mice were immunised with Freund's adjuvant, but the 
immime response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD^jo- The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifiiged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. lOOjiI bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4^C. The wells were then washed three times with PBT washing 
buffer (0.1% Tween-20 m PBS). 200^1 of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 

30 water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
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three times with PBT. 200^1 of diluted sera (Dilution buffer 1% BSA, 0.1% Tween-20, 0.1% NaN3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. \00\i\ of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
5 37°C, Wells were washed three times with PBT buffer. 1 OOfil of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of Ophenildiamine and 10)il of HjO) were added to each well and the 
plates were left at room temperature for 20 minutes. lOO^il H2SO4 was added to each well and OD490 
was followed. The ELISA was considered positive when OD490 was 2.5 times the respective 
pre-immune sera, 

10 O) FACScan bacteria Binding Assay procedure. 

The ac£^)sulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37^C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD^. The bacteria wctc 

15 let to grow until the OD reached the value of 0.35-0.5. The cxJture was centrifuged for 10 minutes 
at 4000rpm. The supematant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaNj) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended iri blocking 
buffer to reach OD520 of 0.07. 100|il bacterial cells were added to each well of a Costar 96 well 
plate. 100|il of diluted (1:200) sera (in blocking buffer) were added to each well and plates 

20 incubated for 2 hours at 4**C. Cells were centrifuged for 5 minutes at 400Qrpm, the supematant 
aspirated and cells washed by addition of 200^1Avell of blocking buffer in each well. lOOul of R- 
Phicoerytrin conjugated F(ab)2 goat anti-mouse, diluted 1 : 100, was added to each well and plates 
incubated for 1 hour at 4®C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200|il/well of blocking buffer. The supematant was aspirated and cells 

25 resuspended in 200|il/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FLl on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 
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P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 20mM 
Tris-HCl. Heat inactivation was performed at 56**C for 30 minutes and the bacteria disrupted by 
sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed by 
5 centrifiigation at 5000g for 10 minutes and the total cell envelope fraction recovered by centrifiigation 
at 50000g at 4^C for 75 minutes. To extract cytoplasmic membrane proteins from the crude outer 
membranes, the \^^ole fraction was resuspended in 2% sarkosyl (Sigma) and incubated at room 
temperature for 20 minutes. The suspension was centrifiiged at lOOOOg for 10 minutes to remove 
aggregates, and the supernatant fiirther ultracentrifiiged at SOOOOg for 75 minutes to peUet the outer 
10 membranes. The outer membranes were resuspended in lOmM Tris-HCl, pH8 and the protein 
concentration measured by the Bio-Rad Protein assay, using BSA as a standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

15 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5^g) and total cell extracts (25fig) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 1 50mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% rnethanol). The membrane was saturated by overnight incubation 

20 at 4°C in saturation buffer (10% skimmed milk, 0. 1% Triton XlOO in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0. 1 % Triton XlOO in PBS) and incubated 
for 2 homrs at 37'*C with mice sera diluted 1 :200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1 :2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton XlOO in PBS and developed with 

25 the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37'C on chocolate agar plates. 5-7 colonies were collected and 
used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37**C on a nutator 
and let to grow until OD^jq was 0.5-0.8. The culture was aUquoted into sterile 1.5ml Eppendorf 
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tubes and centrifixgied for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey*s buffer (Gibco) and resuspended in the same buffer to an OD^jo of 0.5, diluted 
1:20000 in Gey*s buffer and stored at 25**C. 

50[il of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25^1 of 
diluted mice sera (1:100 in Gey*s buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25jil of the previously described bacterial suspension were added to each well. 
25|xl of either heat-inactivated (56^C waterbath for 30 minutes) or nonnal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22^1 of 
each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22|il of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II (page 493) gives a summary of the cloning, expression and prurification results. 
Example 1 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA.AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A.GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT.TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGC.GTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 

501 AGACCG... 

This corresponds to the amino acid sequence <SEQ ID 2; ORF37>: 

1 MKQTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . , 

Further work revealed the complete nucleotide sequence <SEQ ID 3>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 
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This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 

1 MKQTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

5 151 AQNNLGVMYA ERRGVRQDRA LAQEWFGBCAC QNGDQDGCDN DQRLKAGY* 

Fiirther work identified the corresponding gene in strain A oiKmeningitidis <SEQ ID 5>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

10 151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGM AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 

1 MKQTVKWLAA ALIALGLNQA VWA DDVSDFR ENLQAAAQGN AAAQNNLGVM 
15 51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DNDQRLKAGY * 

The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

20 orf37 pep MKQTVXMLAAALIALGLNRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 

I t I I i I I I I M I t I t I : II I I I M I I I j t I I I I M I I t I I t I : I I : t I t : I 
orf37a MKQTVKWLAAALIALGLNQAVWA DDVS DFRENLQAAAQGNAAAQNNLGVMYAERRGVRQD 

10 20 30 40 50 60 

25 70 80 90 100 110 120 

orf 37 , pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 

11:1 : : : I 
or f 3 7 a RALAQEWLGKACQNG YQDSCDNDQRLKAGYX 

70 80 90 

30 Further work identified the corresponding gene in N.gonorrhoeae <SEQ ID 7 >: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

35 201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

40 1 MKQTVKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 

51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 llaa 
overlap with ORF37ng: 

45 orf 37. pep MKQTVXMLAAALIALGLNRPVWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 60 

mil I I II I I I I 1 I t : I I IMIIIIIl II IIIMIt:MI:ll : ll:l 
orf37ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 60 

orf 37 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 120 
50 : : II : I I I : : II I I It I I II I I I : II ! II I : I : I : I : I 

or f 3 7 ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQNGDQNSCDNDQ 120 

orf 37 . pep VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 1 68 



55 



orf 37ng 



RLKAGY 



126 
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The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overly: 



10 20 30 40 50 60 

orf 37-1 . pep MKQTVKWIJy^IALGLNRAVWADDVSDETlENLQAAAQGNAAAQYNLGAMYYKG 

I I I I I I I I i I I I I I I I I I : t I I I I i I I 1 I I I I i i I 1111111:111:11 : I : I ) I : I 
5 orf37ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 37-1 .pep DAEAVRWYRQAAEQGIAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGVVQAQYNLG 
10 ::||:ltl:t:ili tltlllll 11 :lllilil 

or f 3 7 ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

70 80 90 

130 140 150 160 170 180 

1 5 or f 37 -1 . pep VI YAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERRGVRQDRALAQEWFGKAC 

I I t I : I : I I I t 

orf37ng LALAQQWLGKAC 

100 

20 190 199 

orf 37-1 . pep QNGDQDGCDNDQRLKAGYX 
||lll::intlltlMII 
orf37ng QNGDQNSCDNDQRLKAGYX 



110 120 

25 Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N.meningitidis and N.gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
30 1 A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fiision in ExolL Purified GST-fiision protein was used to immimise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure IC), and a 
bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immimogen. 

35 Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1 . 



Example 2 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 
GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 

40 TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 

ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 
GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 
CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 
TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 

45 GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 



1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 
51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 



10 
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101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a hypothetical H. influenzae protein fvbrd.haein: accession number p45029) 
SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

yrbd . h LGIGALVFLGLRVANVQGFAETKSYTVTATFDNIGGLKVRAPLKIGGWIGRVSAITLDE 

|::tlllll:ll:l : I I : : I I I : 1 I : I I 
N . m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

10 20 30 



80 90 100 110 120 130 

yrbd . h KSYLPKVSIAINQEYNEIPENSSLSIKTSGLLGEQY lALTMGFDDGDTAMLKNGSQIQDT 
I I I : : i : : :: : : t : : : : : I t I I I I i I I i I I : I t I t I : t : I : t I 

N.m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDTISVT 

15 40 50 60 70 80 

140 150 160 

yrbd . h TSAMVLEDLIGQFL— YGSKKSDGNEKSESTEQ 
:||[|M:|II:|: :::|::|l:: ::::|: 
20 N.m SSAMVLENLIGKFMTSFAEKNADGGNAEKAAEX 

90 100 110 120 

Homology with a predicted ORF from Ksonorrhoeae 

SEQ ID 9 shows 99.2% identity over all 8aa overly with a predicted ORF from N. gonorrhoeae: 

25 20 30 40 50 60 70 

yrbd GAAAVAFLAFRVAGGAAFGGSDKTYAVYADFGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

I I t I I I I I I I I I t I I M I M I I M I I I I M 
N . m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

10 20 30 

30 

80 90 100 110 120 130 

yrbd KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENXiAAGDTISVTSSAM 
t i n I I I I I 1 I M I I 1 I I I I I I I I I I I I I I I I I M I I I I I t M I I I I I I M I I I I M M I 
N.m KSYQARVRLDLDGCTQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
35 40 50 60 70 80 90 

140 150 160 

yrbd VLENLIGKFMTSFAEKNAEGGNAEKAAEX 
i I I t I I I I I t I I i I I I I I : I I I I I t i I I I 
40 N.m VLENLIGKFMTSFAEKNADGGNAEKAAEX 

100 110 120 

The complete yrbd H.influenzae sequence has a leader sequence and it is expected that the full- 
length homologous Kmeningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
45 epitopes, could be a useful antigen for vaccines or diagnostics. 

Example 3 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 1 1>: 

1 . .ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

50 101 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

201 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

251 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

301 CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
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351 GCGCAACGCg CTTTCGTGGG ACGA7\AAATT CGCCTGCGAT GTTTGGTATA 

401 TCGACCACTT CAGCCTGTGC CTCGACATCA AAATCCTACT GCTGACGGTT 

451 AAAAAAGTAT TAATCAAGGA AGGGATTTCC GCACAGGGCG AACA.aCCAT 

501 GCCCCCTTTC ACAGGAAAAC GCAAACTCGC CGTCGTCGGT GCGGGCGGAC 

551 ACGGAAAAGT CGTTGCCGAC CTTGCCGCCG CACTCGGCCG GTACAGGGAA 

601 ATCGTTTTTC TGGACGACCG CGCACAAGGC AGCGTCAACG GCTTTTCCGT 

651 CATCGGCACG ACGCTGCTGC TTG7\AAACAG TTTATCGCCC GAACAATACG 

701 ACGTCGCCGT CGCCGTCGGC AACAACCGCA TCCGCCGCCA AATCGCCGAA 

751 AAAGCCGCCG CGCTCGGCTT CGCCCTGCCC GTACTGGTTC ATCCGGACGC 

801 GACCGTCTCG CCTTCTGCAA CAGTCGGACA AGGCAGCGTC GTTATGGCGA 

851 AAGCGGTCG. . 

This corresponds to the amino acid sequence <SEQ ID 12; 0RF3>: 



1 . . ILIYLI RKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 
51 ERLTPFGKKL RAASXDELPE LWNILKGEMS LVGPRPLLMQ YLPLYDNFQN 
101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 
151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 
201 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 
251 BCAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV. . 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 

201 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTG7VATT ATGGAATATC 

301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA A/\AAGTATTA ATCAAGGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 

651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 

701 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 

1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 

1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This corresponds to the amino acid sequence <SEQ ID 14; 0RF3-1>: 



1 MSKFFKRLFD IVAS A5GLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GECWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DG^P^VAGNPA 

401 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 



Homologv with a predicted ORF from N. meningitidis (strain A) 

0RF3 shows 93.0% identity over a 286aa overlap with an ORF (0RF3a) from strain A of N, 
meningitidis: 

10 20 30 
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orf 3 . pep ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 

t ) I I I t i I I I I I I I t t I I i I I I I t I t I t I I I I I i 
orf 3a MSKFFKRLFDIVAS ASGLIFLSPVFLILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
10 20 30 40 50 60 



40 50 60 70 80 90 

orf 3 . pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
Itit llllllKlillMllil M I I I i t i : I I I : M I I I I I I t I I I M I t 
orf 3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

100 110 120 130 140 150 

orf 3 . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 
IMMIIIIillliintltlllltllllll:llll:lltlltlllllllltllllllll 
orf 3a YDNFONRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 
130 140 150 160 170 180 



160 170 180 190 200 210 

orf 3 . pep IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
I I I I I I I I I I I i I I I t I I I I I t I I I I I I i I I I I I : I I I t I i I t M I I ) 1 i : I 1 M I I 
orf 3a IKEGISAOGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
190 200 210 220 230 240 

220 230 240 250 260 270 

or f 3 . pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I tlMlllitlMIII[:l:itlllllllllllltlltlllMlltl:|tl:ltlllli 
orf 3a ' FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 

250 260 270 280 290 300 

280 

orf 3 . pep VGQGSWMAKAV 
I I I t : I M M I I 

orf 3a VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
310 320 330 340 350 360 

The complete length 0RF3a nucleotide sequence <SEQ ID 15> is: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

201 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

701 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAACCATTGG CAGGCAZWVA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ED 16>: 



1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DRFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD lAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 
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301 VGQGGWMAK AWQ7VDSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAWVRDVS DGMTVAGNPA 
401 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



5 0RF3-1 shows 94.6% identity in 410 aa overlap with 0RF3a: 



10 



15 



20 
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35 



40 



45 



10 20 30 40 50 60 

orf 3a . pep MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
I I I t M I I I I I I I I I I I I t I I I [ I I I I I I t I I I I I I I I I I I i I I I I I I I I I I I I I t I [ M 
orf 3-1 MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 3a . pep SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
I I : I I I t I I I I t I I I I I I I I t I t I I I M 1 I I M I I I I I : I I I : I i I I i I I [ I I t I I I I I 
orf 3-1 SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

. 130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 
t I I I M I t I i I I I I I t I i I I M I I I [ I I I I I : I i n : I I I I I i t i I t I I M t 1 I I I t I M 
orf 3-1 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 3a . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
IIIIIIIIMill[lllll)MtlllltllllMI:IIIIM I 11111111:111111 
orf 3-1 IKEGISAQGEATMPPETGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
190 200 210 220 230 240 

250 260 270 280 290 300 

or f 3a . pep FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
I I I I I I i I M I 1 I M I I : I : I I I I I I M t I I I I I I I I I I I I I I M I I : i I I : i I I M I I 
orf 3-1 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

250 260 270 280 290 30O 

310 320 330 340 350 360 

orf 3a . pep VGQGGVVMAKAWQADSVLKEX5VIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
1111:1111111111 illlllllllllllltlIIII:MillMltllllM:|IMII 
orf 3-1 VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

or f 3a . pep IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLAGKNTETLRSX 

IIIIIIIIMIIIIIMIIIIIIIIMMIIIllltililttl II M 
orf 3-1 IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 

370 380 390 400 410 



50 



Homology with hypothetical protein encoded by vy/c gene (accession Z71928) of B, subtilis 
0RF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

0RF3 3 lYLXRKNLGSPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 

I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
yvfc 27 lAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 



55 



60 



0RF3 63 ASXDELPELWNILKGEMSLVGPRPLLMQYLPLYDNFQNRRHEMKPGITGWAQVNGRNALS 122 

S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 
yvfc 87 LSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQXNGRNAIS 146 

0RF3 123 WDEKFACDVWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 172 

W++KF DVWY+D++S LD EGI T FTG 

yvfc 147 WEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEGIQQTNHVTAERFTG 196 
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Homology with a predicted ORF firom Kmnorrhoeae 

0RF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORFB.ng) from N, 
gonorrhoeae: 

orf3 ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 34 

: I I t I I I I i I I I I I t :: I 1 I I I I I { I I [ I 1) I I 
or f 3ng MSKAVKRLFDI IAS ASGLIVLSPVFLVLIYLI RKNKGSPVFFIRERP6KDGKPFKMVKFR 60 

orf3 SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLUiQYLPL 94 

1111:1 I I I t I I I I : I I I i 1111111:1 t I I I I I I I : I I i I I t t ! I t t I t I t I I I I I 
orfSng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

orf3 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 

]::ltlllllllttllllllllltllllMIII:Mlil lliMlrliilllt 
orf3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

orf3 IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 2X4 

I I I I t I I I t I t t I I I : I : I I I I I : I t I I I I M I I : I I I 1 I I t I t I t I I I t : I i I t I t 
orf3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 240 

orf3 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 

i ||tllMlltll{lll:t::Mtl[lllllt|:t:lllllt t I I I : I I I [ t t I I I I 
orf3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFBCLPVLIHPDATVSPSAI 300 

orf3 VGQGSWMAKAV 286 

: I t I I I M I II I 

orf 3ng IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 

The complete length ORFSng nucleotide sequence <SEQ ED 17> is: 

1 ATGAGTAAAG CCGTCAAACG CCTGTTCGAC ATCATCGCAT CCGCATCGGG 

51 GCTGATTGTC CTGTCGCCCG TGTTTTTGGT TTTAATATAC CTCATCCGCA 

101 AAAACTTAGG TTCGCCCGTC TTCTTCattC GGGAACGCCc cgGAAAGGAc 

151 ggaaaacCTT TTAAAATGGT CAAATTCCGT TCCAtgcgcg acgcgcttGA 

201 TTCAGACGGC ATTCCGCTGC CCGATAGCGA ACGCCTGACC GATTTCGGCA 

251 AAAAATTACG CGCCACCAGT TTGGACGAAC TTCCTGAATT ATGGAATGTC 

301 CTCAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTTT TGATGCAGTA 

351 TCTGCCGCTT TACAACAAAT TTCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAGTTCT CCTGCGATGT TTGGTACACC GACAATTTCA GCTTTTGGCT 

501 GGATATGAAA ATCCTGTTTC TGACAGTCAA AAAAGTCTTG ATTAAAGAAG 

551 GCATTTCGGC GCAAGGGGAA GCCACCATGC CCCCTTTCGC GGGGAATCGC 

601 AAACTCGCCG TTATCGGCGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCA 

701 CCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCACCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCACCGAAAA CGCCGCCGCG CTCGGCTTCA 

851 AACTGCCCGT TCTGATTCAT CCCGACGCGA CCGTCTCGCC TTCTGCAATA 

901 ATCGGACAAG GCAGCGTCGT AATGGCGAAA GCCGTCGTAC AGGCCGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TGACGCTTTC GtccaCATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCCGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ACAACCGTCG GCAGCGGGGT TACCgccgGT GCAGGGgcGG 

1151 TTATCGTATG CGACATCCCG GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAGCCCCTTA CGGGCAAAAA CCCCAAGACC GGGAC6GCAT AA 



This encodes a protein having amino acid sequence <SEQ ID 18>: 



1 MSKAVKRLFD IIAS ASGLIV LSPVFLVLIY LI RKNLGSPV FFIRERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDSERLT DFGKKLRATS LDELPELWNV 

101 LKGEMSLVGP RPLLMQYLPL YNKFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFSCDVWYT DNFSFWLDMK ILFLTVKKVL IKEGISAQGE ATMPPFAGNR 

201 KLAVIGAGGH GKWAEIAAA LGTYGEIVFL DDRTQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD ITVAVGNNRI RRQITENAAA LGFKLPVLIH PDATVSPSAI 

301 IGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESR IGTGACSRQQ T TVGSGVTAG AGAVIVCDI P DGMTVAGNPA 

401 KPLTGKNPKT GTA* 
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This protein shows 86.9% identity in 413 aa overly with 0RF3-1 : 

10 20 30 40 50 60 

MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
IN I I I I I I: I I I M I I lllll):lllllliltllllli::|llllllltlllllll 
MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

70 80 90 100 110 120 

SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
lltlllllllltMI:|[ll lllllll:lllllllt)t:ltttlllllllMIIIIIM 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

130 140 150 160 170 180 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 
I :: I I I M I I t M I t I I t M I I I I i I I i I 1 I I I : I I I I t |: i I : I I : i I I : I I I I I I I 
YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 
130 140 150 160 170 180 

190 200 210 220 230 240 

IKEGISAQGEATMPPHTGKRKLAWGAGGHGKWADIAAALGRYREIVFLDDRAQGSVNG 
iMllllllltli)ll:l:tllM:ilMltltll:|IIIM 1 I I I I I I I I : I ] i t I i 
IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 
190 200 210 220 230 240 

250 260 270 280 290 300 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
t ll{IIMIMIMIII:|::llllllltllll:t:lillti i I i t : I I I I I I I I f I 
FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 
250 260 270 280 290 300 

310 320 330 340 350 360 

VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 
:|ltlillilllMtllltlllllllltllllllll]:llllltlMllilll:|lill 
IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 
310 320 330 340 350 360 

370 380 390 400 410 

IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 
MIIMIIII :ll :| lltll:! I: i I I I I I I I I I i 1 I Ml:|:ill 
IGTGACSRCXJTTVGSGVTAGAGAVIVCDIPDGMTVAGNPAKPLTGKNPKTGTAX 
370 380 390 400 410 

In addition, 0RF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl I PID|e238668 (Z71928) hypothetical protein [Bacillus subtilis] 
45 >gi|1945702!gnl|PID|e313004 (Z94043) hypothetical protein (Bacillus subtilis] 

>gi|2635938|gnlIPID|ell86113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis] Length = 202 

Score = 235 bits (594), Expect = 3e-61 

Identities - 114/195 (58%), Positives = 142/195 (72%) 
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Query : 


5 


VKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFRSMRD 


64 








+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 






Sbjct: 


3 


LKRLFDLTAAIFLLCCTSVIILFTIAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 


62 


55 


Query : 


65 . 


ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 


124 








DS G LPD RLT G+ +R S+DELP+L NVLKG++SLVGPRPLLM YLPLY + 






Sbjct: 


63 


ERDSKGNLLPDEVRLTKTGRLIRKLSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEK 


122 


60 


Query: 


125 


QNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVLIKEG 


184 






Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+JCVL+ EG 






Sbjct: 


123 


QARRHEVKPGITGWAQINGRNAISWEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEG 


182 




Query: 


185 


ISAQGEATMPPFAGN 199 










I T F G+ 




65 


Sbjct: 


183 


IQQTNHVTAERFTGS 197 





10 



20 



25 



30 



40 



orf3-l.pep 
orf 3ng 

orf 3-l.pep 
orf3ng 



15 orf 3-1. pep 

orf 3ng 



orf 3-1 .pep 
orf 3ng 

orf 3-1 .pep 
orf3ng 



orf 3-l.pep 
35 orf3ng 



orf 3-l.pep 
orf 3ng 
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The hypothetical product of yyfc goae shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous Kgonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 19>: 

1 , .AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT.GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; 0RF5>: 

1 ..NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS XR REXTTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATAGCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTATVAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; 0RF5-1>: 



1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE lEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further woric identified the corresponding gene in strain A of N.meningitidis <SEQ ID 23 >: 



1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATAGCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 
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401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA 7VAGTCNTTAT 

751 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 

851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 24; 0RF5a>: 



1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
201 ERWRIHAATE lEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 
251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

The originally-identified partial strain B sequence (0RF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 



10 20 30 

orf 5 . pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

I it I I i I It [ I I I i i I I I I M I t i I I I I : I 
orf 5a FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 
130 140 150 160 170 180 



40 50 60 70 80 90 

orf 5 . pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
t t i t I M : t t t t t t I t t :: t I t t I II t t I t t t : I t II t II t t t 1 II III : I I I 
orf 5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
190 200 210 220 230 240 



100 110 120 130 

RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 
t I I I t I III I 1 1:1 I I I It t I I II I I ) I I 

RARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXTX 
250. 260 270 280 290 300 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 



orf 5 .pep 
orf 5a 



10 20 30 40 50 60 

orf 5a . pep MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 
lllllMllt ltllttlttttlltttl:illtltniltttllllltlllllllllll 
orf 5-1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 5a . pep RDAMITRSRMNVLKENDSIERXTAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
I t I t I t I 1 I I 1 I I I II i 1 I t I I I I 1 t I I I I I I t I i I I I i I I I I I I I I I I I I I I 1 I 1 I 1 I I 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 5a . pep EQFHLKS I LRPAVFVPEGKSLTALLKEFREQRNHMAIV I DEYGGTSGLVT FEDIIEQIVG 
I I I M I t II I I I I I I I I M I I I I I t I M I I 1 I I I I I I I M t I t I I I I I I M M I I I I I I I 
orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 5a . pep DIEDEFDEDESADNIHAVSAERWRIHAATEIEDIN^FFGTEYSSEEADTIGGXGHSGIGT 
: M I I I I I t : I I I ) I II I I : I ) I I II I I I I M I 11 : II t I I I t I I I I M I III : I I 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 



250 260 270 280 290 300 
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orfSa.pep PARARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXT 
lllltii III i i l:l llllllilllllMI:ll|:IMIMIilllllllll 1 
orf5-l SAR/VRRKSPYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 
240 250 260 270 280 290 

Further work identified the a partial DNA sequence in Kgonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; 0RF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE lEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

401 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGT GACATCGAAG 

551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

701 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 

851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 

901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDS/^DVLN LLRQAHEQEV FDADTLTRLE 
51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 
201 ERWRIHAATE lEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 
251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 
301 IRQT* 

The originaUy-identified partial strain B sequence (0RF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORFSng): 

orf5 NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 30 

I II t I I I I I I I I I I M II II I I I i I II I: I 
orf 5ng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDI lEQIVGDI 182 

orf5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

I II I I II : I It : I I : II : : I I II I I II I I I I I : I t I I II : I I t III I III : I I I 
orf5ng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

orf 5 RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSX RRFCTV 131 

I I I I I II I II I II I t I II n I I t : I I I t I I I M I I II t I 

orf5ng RARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 

The complete strain B and gonococcal sequences (ORF5-1 & ORFSng- 1) show 92.4% identity in 
304 aa overlap: 

10 20 30 40 50 60 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 
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lltlllllllltlltlllilllllllllMllllltttMMIMI 

MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 
10 20 30 • 40 50 60 

70 80 90 100 110 120 

RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

I I I I I I i t M I I I I I I I I I I 1 1 I i ) I I I I I 1 1 I I 1 1 I 1 1 1 1 I I I I I I I I I t M M I I I I I 

RDAMITRSRI^IJCENDSIERITAWIDTAHSRFPVIGEDKDEVIXSILHAKDLLKYMFNP 
70 80 90 100 110 120 

130 140 150 160 170 180 

EQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
M I M I I : I I I I I I I I I I I I I I I I I I I I t I I I I M I I I M I I I I i I I I I I I I I I I I I I f t 
EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
130 140 150 160 170 180 

190 200 210 220 230 240 

DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 
: I I I i I 1 I I : I I t : M : I I : I i t I I I I i I I I I I I I : t t I I I I : I i I I I I M III : M 
EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 
190 200 210 220 230 

250 260 270 280 290 300 

PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRSFSVSIRP 
11 I I I I II I M I II I t I I I I I I I I : II I I I i I I I i I I I I t I t II II : I I II I I I 

SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVSIRP 

240 250 260 270 280 290 



or f 5ng- 1 . pep IRQTX 
I I I II 

orf5-l IRQTX 
300 

Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
identified the following homologies: 

Homology with hemolysin homolog TlvC (accession U32716) of H.influenzae 
0RF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 



ORF5 


2 


HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 


61 






HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 




TlyC 


166 


HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-XRQLSRHTYAVRALTDIDD 


224 


0RF5 


62 


INTFFGTEYSIEEADTI 78 








N F T++ EE DTI 




TlyC 


225 


FNAQFNTDFDDEEVDTI 241 





0RF5ng-l also shows significant homology with TlyC: 

SCORES Initl: 301 Initn: 419 Opt; 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQ7VHEQEVFDADTLTRLEK 

I M: |::|: : I : | :::::: | :::::::: | :| :| 
tlyc_haein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 
10 20 30 40 50 60 

60 70 80 90 100 109 

orf 5ng-l .pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE— DKDEVLGILH 
l:::lll:||| II II:: :::::::: : I : : I I I I I I I 1 : : |:|:::|||| 

tlyc_haein VMEIAELRVRDIMIPRSQIIFIEDQQDLNTCLNTIIESAHSRFPVIADADDRDNIVGILH 
70 80 90 100 110 120 

110 120 130 140 150 160 

orf 5ng-l . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 
llllll:: : I t 1:1:111:1:111:1 : :||:M :l I I II I : I t : I : : I II 
tlyc_haein AKDLLKFLREDAEVFDLSSLLRPWIVPESKRVDRMLKDFRSERFHMAIWDEFGAVSGL 



orf5-l 

orf5ng-l.pep 
orf5-l 

orf5ng-l .pep 
orf5-l 

orf5ng-l.pep 
orf5-l 

orf5ng-l.pep 
orf5-l 
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130 140 150 160 170 180 

170 180 190 200 210 220 

orf 5ng-l . pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 
5 I I : I I I : I I I I I t I I I I t I I : I i t |:::| : : ::| t:|:|:|l l:t:: 

tlyc haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 
190 200 210 220 230 

230 240 250 260 270 280 

10 orf 5ng-l .pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 

II I : :l I I: 

tlyc_haein TIGGLIMQTFGYLPKRGEEIILKNLQFKVTSADSRRLIQLRVTVPDEHLAEMNNVDEKSE 
240 250 260 270 280 290 

15 Homoloev with a hypothetical secreted protein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 

sp|P77392|YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>gi 1 1778577 (U82598) similar to H. influenzae [Escherichia coli] >gi 11786879 
(AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
20 approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli] Length = 292 

Score = 212 bits (533), Expect = 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps = 3/230 (1%) 

25 Query: 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
Sbjct: 10 DTISNBCKGFFSLLLSQLFHGEPKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Query: 61 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM-FN 119 
30 RD MI RS+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

Sbjct: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Query: 120 PEQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 179 
E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
35 Sbjct: 130 AEAFSMDKVLRQAVWPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

Query: 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A lED N FGT +S EE DT 
Sbjct: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

40 Based on this analysis, including the amino add homology to the TlyC hemolysin-homologue from 
H, influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
Kmeningitidis and Kgonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli^ as described above. The 
45 products of protein expression and purification were analyzed by SDS-PAGE. Figure 2A shows 
the results of afiBnity purification of the GST-ftision protein. Purified GST-fiision protein was used 
to immunise mice, whose sera were used for Westem blot analysis (Figure IB). These experiments 
confimi that ORF5-1 is a surface-exposed protein, and that it is a useful immxmogen. 

Example 5 

50 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 29>: 



1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 
51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 
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101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 

301 TGCCTTATAA 7U\ACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

401 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

451 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

551 GATTGCGCTG CCC. . 

This corresponds to the amino acid sequence <SEQ ID 30; ORF7>: 

1 MRGGRPDSVT VQIIEGSRFS HMRPCVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP, . 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGATVATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

1 MLRKLLKWSA VFLTVSAAVF AA LLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EPCETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded by vceg gene (accession P44270) of H.influenzae 
ORF7 and yceg proteins show 44% aa identity in 1 92 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G+ V+ lEG F RK ++ P + K SNE++ A ++ + 

yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 

0RF7 56 NPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLV 115 

N EG +PD+Y +DL++ + + + M++ LN+AW R + LP NPYEMLI+A +V 

yceg 162 NVEGWLYPDTYNYTPKSTDI^LLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

0RF7 116 EKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYT 175 

EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

yceg 222 EKETGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 
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0RF7 17 6 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

The complete length YCEG protein has sequence: 

1 MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETBCTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVMGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 



Homology with a predicted ORF from N.menin^tidis (strain A) 

0RF7 shows 95.2% identity over a 187aa overly with an ORF (0RF7a) from strain A of A': 
meningitidis: 

10 20 30 

orf 7 . pep MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 

I I I I M I i I I I I I I I t I I i i I I I i I I t I I t 
orf 7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQBCMRGGRPDSVTVQIIEGSRFSHMRKVIDATP 
70 80 90 100 110 120 



40 50 60 70 80 90 

orf 7 . pep DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 
II I I II II I II I I I I t I 1 I I t t I I I I I t t I I I M t I I I I II I I I : M I I I I I I M i I t 
orf 7a DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 
130 140 150 160 170 180 

100 110 120 130 140 150 

orf 7 . pep EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 
I II I I I I II I I I I I I 1 II I I I I 1:11111111 I II I II II II I II I I I II I MM 
orf 7a EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 
190 200 210 220 230 240 



160 170 180 

orf 7 . pep GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALP 
lllllllllllllltMlllltllllllllllMIII 
orf 7a GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 
250 260 270 280 290 300 



orf7a 



DGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 



The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 



ATGTTGAGAA 
AGCCGTTTTC 
ACAGGATTAA 
CTTGCCGAAG 
CTACGTTTTG 
CTTCGGAAGT 
CCGGATTCCG 
GAGGAAAGTC 
GGAGCAATGA 
AATCCTGAAG 
CAGCGATTTA 
TGAATGAGGC 
TATGAAATGC 
AGCCGACCGC 
GTATGCGCCT 
TACAAGGGCA 
CACCTACACG 
AGGCGGCACT 
TTCGTGTCCA 
GACCGAACAC 



AATTGTTGAA 

GCCGCGCTGC 

AATTGCCAAA 

ACCGCATCGT 

GGTGTGCACA 

GTCTGCTTGG 

TTACCGTGCA 

ATCGACGCAA 

AAAACTGATG 

GGCAGTTTTT 

CGGATTTACC 

ATGGGAAAGC 

TGATTATGGC 

GACCATGTCG 

GCAAACCGAC 

AAATCCGTAA 

CGCGGCGGTC. 

CGATGCCGCC 

AAATGGACGG 

AACGCCGCCG 



ATGGTCTGCC 
TTTTCGTCCC 
AACCAGGGTA 
GTTCAGCAGG 
ACAGGCTGCA 
GATATCTTGC 
GATTATCGAA 
CGCCCGACAT 
GCGGAAGTTG 
CCCCGACAGC 
AAATCGCCTA 
AGGCAGGACG 
GAGCCTGATC 
CTTCCGTCTT 
CCGTCCGTGA 
AGCCGACCTG 
TGCCGCCAAC 
GCCCATCCGT 
TACGGGCTTG 
TTCGCAAATA 



GTTTTTTTGA 
TAAAGACAAC 
TTTCGTCGGT 
CATGTTTTGA 
TACGGGGACG 
AGAAAATGCG 
GGTTCGCGTT 
CGAACACGAC 
CCCCTGATGC 
TACGAAATCG 
CAAGGCGATG 
GGCTGCCTTA 
GAAAAGGAAA 
CGTCAACCGC 
TTTACGGCAT 
CGCCGCGACA 
CCCGATCGCG 
CCGGTGAAAA 
AGCCAGTTCA 
TATTTTGAAA 



CCGTATCGGC 
GGCAGGGCAT 
CGGCAGGAAA 
CGGCGGCGGC 
TACAGACTGC 
CGGCGGCAGG 
TTTCGCATAT 
ACCAAAGGCT 
CTTCAGCGGC 
ATGCGGGCGG 
CAACGCCGAC 
TAAAAACCCT 
CAGGGCATGA 
CTGAAAATCG 
GGGTGCGGCA 
CGCCGTACAA 
CTGCCCGGCA 
ATACCTGTAT 
GCCATGATTT 
AAATAA 
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This is predicted to encode a protein having amino acid sequence <SEQ ID 34>: 



1 

51 
101 
151 
201 
251 
301 



MLRKLLKWSA VFLTVSAAVF 



LAEDRIVFSR 
PDSVTVQIIE 
NPEGQFFPDS 
YEMLIMASLI 
YKGKIRKADL 
FVSKMDGTGL 



HVLTAAAYVL 
GSRFSHMRKV 
YEIDAGGSDL 
EKETGHEADR 
RRDTPYNTYT 
SQFSHDLTEH 



_AALLFVPKDN 
GVHNRLHTGT 
IDATPDIEHD 
RIYQIAYKAM 
DHVASVFVNR 
RGGLPPTPIA 
NAAVRKYILK 



GRAYRIKIAK 
YRLPSEVSAW 
TKGWSNEKLM 
QRRLNEAWES 
LKIGMRLQTD 
LPGKAALDAA 
K* 



NQGISSVGRK 
DILQKMRGGR 
AEVAPDAFSG 
RQDGLPYKNP 
PSVIYGMGAA 
AHPSGEKYLY 



A leader peptide is underlined. 



10 0RF7a and 0RF7-1 show 98.8% identity in 331 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



10 20 30 40 50 60 

orf 7a . pep MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
lltllliMllltlllMIIIillttMlllllillltlllttlllMlllllillllll 
orf 7-1 MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7a . pep HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 
M I I I I I I I I I I I i I I I t t t I I i M t t I I M I i i I I I I I I I I I I I I i I i I I I I I I I I I I i 
orf 7-1 HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 7a . pep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 
I I I I t I I I I I I I t I I t M I I I t I I I I I I M I M t I i M I t I I I t I I I I I : I I I I I I I I 
orf 7-1 IDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYECAM 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 7 a . pep QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 
i I I t I I I I I I I I I I I I I I M I M i I I i I I : i i I I i i I I I I I I i t I I I I I I I t I t I I I I [ i 
orf 7 -1 QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 7 a. pep PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
I t t i 1 I I 1 I I I I I i I I I I t t I I [ I I I i i I [ I I I I I I I It I t I t I I I I i I I I t t t I 11 I I I 
orf 7-1 PSVIY04GAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

250 260 270 280 290 300 

310 320 330 

or f 7 a . pep FVSBCMDGTGLSQFSHDLTEHNAAVRKYILKKX 

1 1 M 1 1 1 1 1 i 1 1 i 1 1 1 1 1 1 1 i 1 1 1 1 1 1 n I I I 

orf 7-1 FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 

Homologv with a predicted ORF from Ksonorrhoeae 

0RF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (0RF7.ng) from K 



gonorrhoeae: 



50 



55 



60 



orf7 
orf7ng 
orf7 
orf 7ng 
orf7 
orf 7ng 



MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 
t t i t I i I I I I t I I I I I I I t I i I I I I I I I I I I i M I I I I I I I I I I 1 I I I I I I I I I M I I I I 

MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 120 
I i I I I I I I I I I I I I I t I t I I I I I t I I t M I M I : I I i I I t I I t I I I t I M I 1:11111 

FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 

HEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLP 180 
Ml I I II 1 I I I I I It II I I I I I i II t I I II I I I t I II I It M II I M I I I I Mil 

HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 



orf7 



PTPIALP 



187 
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II I I M 

orf7ng PTRIALPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 236 

An 0RF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 

5 1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVKRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

1 0 Further sequence analysis revealed a partial DNA sequence of 0RF7ng <SEQ ID 37>: 

1 . . taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGTCGG TCGGCAGGAA 

51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 

101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 

151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 

15 201 GCCGGATTCC GTTACCGTGC AGATTATCGA AGGTTCGCGT TTTTCGCATA 

251 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 

301 TGGAGCAATG AAAAACTGAT GGCGGAAGTT GCGCCCGATG CCTTCAGCGG 

351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 

4 01 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 

20 451 CTGAACGAGG CATGGGCAGG CAGGCAGGAC GGGCTGCCTT ATAAAAACCC 

501 TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 

551 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG CCTGAAAATC 

601 GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA TGGGTGCGGC 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC ACGCCGTACA 

25 701 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc gctgcccggC 

751 Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa aatacctgTa 

801 tttcgtgtcC AAAATGGACG GCACGGGCTT GAGCCAGTTC AGCCATGATT 

851 TGACCGAACA CAACGCCGCc gTcCGCAAAT ATATTTTGAA AAAATAA 

This corresponds to the amino acid sequence <SEQ ID 38; 0RF7ng-l>: 

30 1 ..YRIKIAKNQG ISSVGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 

51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 

151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLKI 

201 GMRLQTDPSV lYGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 

35 251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 

10 20 30 40 50 60 

orf7-l.pep KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

I t I I t I I I I I t t M I I t I I I I I I I I I I I I I 
40 orf7ng-l YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

10 20 30 

70 80 90 100 110 120 

.orf7-l.pep TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
45 M I i I I I M I I I I i i I I t t I I I I I I i I M I I I I I I I i I I I M I M t i I t I I I I I I I I i I i 

orf7ng-l T7VAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
40 50 60 70 80 90 

130 140 150 160 170 180 

50 orf 7-1 . pep TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 

I I I I I I I I I I I I I I I I M I I t I I I M I I I I I I I I I I I I M I I I M t I t If I I I I 1 I I I i t 
orf7ng-l TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
100 110 120 130 140 150 

55 190 200 210 220 230 240 

orf 7-1 . pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
I I I I I :[ I I I I t I I I I i M I M I I I : I I I I I I I I I I I I I I I M I t I I M I I I I t I I I I I 
orf7ng-l LNEAWAGRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
160 170 180 190 200 210 



60 



250 260 270 280 290 300 

orf 7-1 . pep lYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
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I I 1 [ I I I t I I I I I M I I I t I I I t I I I I I I I i I I I I I I I I I I : I I I I 1 t I I I M I I I I I 
orf7ng-l lYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 
220 230 240 250 260 270 

310 320 330 

orf 7-1 . pep KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
I I M I I i I t I I I i [ i I I I I I I I I I i I I [ I 
orf7ng-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
280 290 



In addition, 0RF7ng-l shows significant homology with a hypothetical E.coli protein: 

sptP28306|YCEG_ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
git 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
has 97 additional C-terminal residues [Escherichia coli] Length =340 



Score 


= 79 


(36.2 bits). Expect - 5.0e-57, Sum P{2) = 5.0e-57 




Identities - 


= 20/87 (22%), Positives = 40/87 (45%) 




Query : 


10 


GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 


69 






G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 




Sbjct: 


49 


GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 


108 


Query: 


70 


SVTVQIIEGSRFSHMRKVIDATPDIGH 96 








++++EG R S K + P I H 




Sbjct: 


109 


QFPLRLVEGMRLSDYLKQLREAPYIKH 135 




Score 


= 438 


(200.7 bits). Expect = 5.0e-57, Sum P(2) = 5.0e-57 




Identities = 


= 84/155 (54%), Positives = 111/155 (71%) 




Query : 


120 


EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 


179 






EG F+PD++ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MTVS+IEK 




Sbjct: 


158 


EGWFWPDTWMYTANTTDVALLKRAHKKMVKAVDSAWEGRADGLPYKDKNQLVTMASIIEK 


217 


Query: 


180 


ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 


239 






ET ++RD VASVF+NRL+IGMRLQTDP+VIYGMG Y GK+ +ADL T YNTYT 




Sbjct: 


218 


ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 


277 


Query: 


240 


GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 274 








GLPP lA PG ++ AAAHP+ YLYFV+ G 




Sbjct: 


278 


GLPPGAIATPGADSLKAAAHPAKTPYLYFVADGKG 312 





Based on this analysis, including the fact that the HAnfluenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis mdKgonorrhoeae, and their 
epitopes, could be usefiil antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATC^GC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

451 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ED 40; ORF9>: 

1 . . RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 

51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 

101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 
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151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

401 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

451 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

701 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

751 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 

901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

1101 AAAAGTATCC GCGCCGGAAT ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

1301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

1401 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

1451 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

1701 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 

1751 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 

1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

This corresponds to the amino acid sequence <SEQ ID 42; 0RF9-1>: 

1 MLPNRFKMLT VLTATLIAGQ VSAAGGG AGD MKQPKEVGKV FEIKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVIAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

351 MMYADRRDYA KVRQWLKKVS APEYLFDKGV lAAAAAVELD GGRAALRQIG 

401 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 

451 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETUCR 

601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninj^tidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap witti an ORF (ORF9a) from strain A of K 
meningitidis: 

10 20 30 40 50 

or f 9 . pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
II :|:lt:|:t:|ll: II ||:| j I I I I I I I I I I I I I II I II I II I I M II 

orfBa MLPARFTILSVLAAALLAGQAYAA--GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
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10 



20 



30 



40 



50 



60 70 80 90 100 110 

orf 9 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I I I I I I I I It I I I I I t I I I I i I I I I I t I I I I I i I I i I M I I I i i I I I I I I I t I I I I I I I 
orf 9a AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

120 130 140 150 160 

or f 9 . pep EMI YQKWRQIEPI PGKAQKRAGWLRNVLRERGNQHLDGREEVIAQADEGQ 

I I I I I I I I I I I I I t M M I I I I I t I I I I I i I I It I I i I 11 llltll I 
orf 9a EMIYQKWRQIEPIPGKAQiCRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
120 130 140 150 160 170 



orf9a 



AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
180 190 200 210 220 230 



The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



ATGTTACCCG 
TGCCGGGCAG 
AAGTCGGAAA 
AAAAACGAAC 
ATTTACGTTG 
CGGCTCTGGC 
GTCGCCGAAC 
GGCGGAAATG 
AGGCGCAAAA 
AATCAGCATC 
ACAGAACCGC 
ACGGGTTGGC 
TATGAACATC 
GGNACGCGAA 
TCGATACGGA 
CGCAAATATC 
AAACCTTTCG 
TGCACAGGCT 
AATCCGAATG 
AAAAGAANGT 
GGGGGACGGG 
TATGCCGACC 
GTCCGCGCCG 
CTGTCGAGTT 
CGGAAACTTC 
CAAAATACAG 
TGAGGGGGTT 
GAGTTACAGG 
TGGCAAGCGG 
CACCCGATAA 
GATTCCAAAC 
AATCAACCCG 
ACCTGAAANG 
GAAAACGACC 
GGCATTGGGC 
ACCTTACGGG 
ATCGCATTGC 



CCCGTTTCAC 
GCGTATGCCG 
GGTTTTCAGA 
GCGCACGGCT 
CTGGGANGGG 
AACCTATATG 
GCGCCTTGGA 
ATTTATCAGA 
ACGGGCGGGG 
TAGACGGACT 
AGGGTGTTTT 
GCAAAAAGCA 
TGCCCGAAGC 
AAGGAAAAGG 
AATATTGCCC 
CCGAAATACT 
GCCGTCTGGC 
GGATGATGCC 
CAGACCTGTA 
GCTTCCGTTA 
GGAACAGCGG 
GAAGGGATTA 
GAATACCTGT 
GGACNGCGGC 
CCGAACAGCA 
ATGTTCGCCC 
GGACAAGATT 
CAGAGGCATT 
AAAAAAATGA 
CGCTCAGATT 
GTTTGGACGA 
GACGATACCG 
CGACGCGGAA 
CCGAGCCCGA 
GAACGCGATC 
AGACAAGT^ 
CCCAACCTTC 



CATTTTATCT 
CCGGCGCGGC 
AAGCAGCAGC 
TGCGGCAGTG 
AAACCGCCTT 
CTGATGTTGG 
AATGGCCGTG 
AATGGCGGCA 
TGGCTGCGGA 
GGAAGAANTG 
TATTGTTGGC 
TCGAAAGCGG 
GGCGGTTGCC 
CAATCGGAGC 
CCCACTTTAA 
CGACGGCTTT 
AGGAAATGGA 
TATGCGCGTT 
TATTCAGGCA 
TCGACGGCTA 
GGCAGGGCGG 
CACCAAAGTC 
TCGACAAAGG 
AGGGCGGCTT 
GGGGCGGTAT 
TGTCGAAGCT 
ATCGAAAAAC 
GGTACAGCGG 
TTTCAGATCT 
ATGAATAATC 
AGGCTTCGCC 
CTGTCAACGA 
AGCGCGCTGC 
AGTTGCCGCC 
AGGCGGTTGA 
ATATGGCGGG 
CCGAAAACCT 



GTGCTCGCGG 
GGATGCGAAG 
GTTACAGCGA 
GGCGAGCGGG 
GCAAAAGGGG 
AACGCACAAA 
TCNCTGAACG 
GATTGAGCCT 
ACGTGCTGAG 
CTGGCTCAGG 
ACAAGCCGCC 
TTCGCCGCGC 
GATGTGGTGT 
TTTGCAGCGT 
TGACGTTGCG 
TTCGAGCAGA 
AATTATGAAT 
TGTVACGTGCT 
GCGATATTGG 
CGCCGAAAAG 
CAATGACGGC 
AGGCAGTGGT 
TGTGCTGGCG 
TGCGGCAGAT 
TTTACGGCAG 
GCCCGACAZ^ 
CGCCTGCCGG 
TCAGTTGTTT 
TGAAAGGGCG 
TGGGCTACAG 
CTGCTTCAGA 
CAGCATAGGC 
CGTATCTGCG 
CATTTGGGCG 
CGTATGGACG 
AAACGCTCAA 
CGGAAATAA 



CAGCCCTGCT 
CCGCCGAAGG 
GGAAGAAATC 
TTAATCAGAT 
CAGGCGGGAA 
ATCCCCCG/^ 
CGTTTGAACA 
ATACCGGGTA 
GGAAAGAGGA 
CGGACGAANG 
GTGCAACAGG 
GGCGTTGAGA 
TCAGCGTACA 
TTGGCGAAGC 
TCTGACTGCA 
CAGACACCCA 
CTGGTTTCCC 
GTTGGAACGC 
CGGCAAACCG 
GCATACGGCA 
GGCGATGATA 
TGAAAAAAGT 
GCTGCGGCGG 
CGGCAGGGTG 
ACAATTTGTC 
CGGGAGGCTT 
CAGTAATACA 
ACGATCGGCT 
TTCAGGCTTG 
CCTGCTTTCC 
CGGCATACCA 
TGGGCGTATT 
GTATTCGTTT 
AAGTGTTGTG 
CAGGCGGCAC 
ACGTCACGGC 



This encodes a protein having amino acid sequence <SEQ ID 44>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



MLPARFTILS VLAAALLAGQ AYAAGAADAK 



KNERARLAAV 
VAERALEMAV 
NQHLDGLEEX 
YEHLPEAAVA 
RKYPEILDGF 
NPNADLYIQA 
YADRRDYTKV 
RKLPEQQGRY 
ELQAEALVQR 
DSKRLDEGFA 
ENDPEPEVAA 



GERVNQIFTL 
SLNAFEQAEM 
LAQADEXQNR 
DWFSVQXRE 
FEQTDTQNLS 
AILAANRKEX 
RQWLKKVSAP 
FTADNLSKIQ 
SWYDRLGKR 
LLQTAYQINP 
HLGEVLWALG 



LGXETALQKG 
lYQKWRQIEP 
RVFLLLAQAA 
KEKAIGALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MFALSKLPDK 
KKMI SOLERA 
DDTAVNDSIG 
ERDQAVDVWT 



PPKEVGKVFR 
QAGTALATYM 
IPGBCAQKRAG 
VQQDGLAQKA 
LAKLDTEILP 
LVSLHRLDDA 
AYGRGTGEQR 
AAAAVELDXG 
REALRGLDKI 
FRLAPDNAQI 
WAYYLKXDAE 
QAAHLTGDKK 



KQQRYSEEEI 
LMLERTKSPE 
WLRNVLRERG 
SKAVRRAALR 
PTLMTLRLTA 
YARLNVLLER 
GRAAMTAAMI 
RAALRQIGRV 
lEKPPAGSNT 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRHG 
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601 lALPQPSRKP RK* 



0RF9a and 0RF9-1 show 95.3% identity in 614 aa overlap: 

10 20 30 40 50 

orf 9a . pep MLPARFTILSVLAAALLAGQAYAAG — A7U)AKPPKEVGKVFRKQQRYSEEEIKNERARLA 
III t I : I : II : I : I : I II : III I : I I II I II I i I II II I II I II I I I I I I I I I 
orf 9-1 MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 9a . pep AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I I I I II i I I i I I i I t I I I I I I I M II II II t I i I I I I I I M I I I I t t I I I t I I I I I I I I 
O r f 9- 1 AVGERVNQI FTLLGGETALQKGQAGT ALATYMLMLERTKS PEVAERALEMAVS LN AFEQA 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 9a . pep EMIYQKWRQIEPI PGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
I I I I I I I I t I I I I t I I I I I I II I I I I It I I I I I II I I II I I II i I I I I M I I I I I M I 
orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
130 140 150 160 170 180 

180 190 200 210 220 230 

orf 9a . pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
I I I i I I I 1 I i I I I I I I I t I I I : N I I t I I I I I I I I I t i I I I I I I I I I I I I I I M I I I II 
orf 9-1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGiVLQRLAKLDTEI 
190 200 210 220 230 240 

240 250 260 270 280 290 

orf 9a . pep LPPTUITUILTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
I I I II [ I I II I I I M I I I t I t I I II I I I I I I I I I I I M I I I I I I I I I I I I I i I M I t 11 I 
orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 

300 310 320 330 340 350 

orf 9a . pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMI YADRRDYT 
I I I I I I I I I I I I I I I I I I I I i I t M I I t I I I I t I I I i I I I : I I I : I I I I : I I I I t I I : 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
310 320 330 340 350 360 

360 370 380 390 400 410 

orf 9a . pep KVRQWIiCKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I I I I I I I I I t I I I I I I I I I I II I I I I I I I t I M ! I I II I M I I II I I I II I I i I I I 
orf 9-1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
370 380 390 400 410 420 

420 430 440 450 460 470 

orf 9a . pep IQMFALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
I I I : I I I I I I I M I I I I I I I II I I t I I It I I I I t I t I I I I I I I I I t M t I M I II I t I I t 
orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 

480 490 500 510 520 530 

orf 9a . pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

III iitiii nil 11111111:1111 II III ttiiiiiiitiiii 11 till II mil I 

orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
490 500 510 520 530 540 

540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I I I I I I I I I 1 I 1 t t I I I I II i t 1 II t t It I I It t I I I I II I I I I I I I 1 I I 1 I t I I I I t I i 
orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
550 560 570 580 590 600 

600 610 
orf 9a . pep HGIALPQPSRKPRKX 
I I I I I I I I I 1 I I f I I 
orf 9-1 HGIALPQPSRKPRKX 
610 
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Homology with a predicted ORF jfrom Ksonorrhoeae 

0RF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (0RF9.ng) from N, 
gonorrhoeae: 



Orf9 


RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 


54 




II :|:||:|:|:|||: II M:|:: II I II 1 1 : 1 1 : : II II t 1 1 1 II 1 II 




orf 9ng 


MIMLPARFTILSVIJWVLIAGQAYAA--GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 


58 


orf9 


LAAVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 


114 




1 1 1 II 1 1 1 1 :: 1 II 1 1 1 1 1 M M 1 1 1 1 t t 1 1 t 1 M 1 M II 1 1 1 t t II 1 1 1 1 M 1 1 II 1 1 1 




brf 9ng 


LAAVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 


118 


orf9 


QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 


166 




1 II t 1 t II t M 1 1 1 t II : M 1 11 t II 1 II : 1 II III III 1 1 : 1 




orf 9ng 


QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 


178 



The 0RF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYT^GA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 



Further sequence analysis revealed the complete length 0RF9ng DNA sequence <SEQ ID 47>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551' 
1601 
1651 



ATGTTACCCG 
TGCCGGACAG 
AAGTCGGAAA 
AAAAACGAAC 
GTTTACGCTG 
CGGCTCTGGC 
GTCGCCGAAC 
GGCGGAAATG 
aggcgcaaaa 
aaTCAGCATC 
GCAAAAAcgc 
gTGGGGTGGC 
TATGAACATC 
GGGACGCGAA 
TCGATACGGA 
CGCAAATATC 
AAACCTTTCG 
TGCGTAAGCC 
AACCCGAATG 
AAAAGAAGGT 
GGGGGACGGG 
TATGCCGACC 
GTCCGCGCCG 
CTGCCGAATT 
CGGAAACTTC 
CAAAATACAG 
TGATCGGGCT 
GAACCTTTGG 
cggCAAACGG 
CGCCCGATAA 
GATTCCAAAC 
AATCAACCCG 
ACCTGAAAGG 
gAAAACGACC 



CCCGTTTCAC 
GCGTATGCTG 
GGTTTTAAGG 
GCGCACGGCT 
TTGGGCGGTG 
AACCTATATG 
GCGCCTTGGA 
ATTTATCAGA 
accgGcgggG 
TGGAcgggtt 
aggaTATTTT 
TCAAAAAGCA 
TGCCcgaagc 
AAGGAAAagg 
AATATTGCCC 
CCGAAATACT 
GCCGTCTGGC 
GGATGATGCC 
CAAACCTGTA 
GCGTCCGTTA 
GGAACAGCGG 
GCAGGGATTA 
GAATACCTGT 
GGACGGAGGC 
CCGAACAGCA 
ATGCTCGCCC 
GAACAACATC 
CGGAAGCATT 
GGAAAAATGA 
TGCACAAATT 
GTTTGGACGA 
GACGATACCG 
CGACgcggaA 
CCGAGCCCGA 



TATTTTATCT 
CCGGCGCGGC 
AAACATCGGC 
TGCGGCAGTG 
AAACGGCTTT 
CTGATGTTGG 
AATGGCCGTG 
AATGgcggca 
tggctgcgga 
gaaagaggTG 
TGCTGCTGGT 
TCGAAAGCGG 
ggcggTTGCC 
caaTCGAAGC 
CCCACTTTAA 
CGACGGCTTT 
AGGAAATGGA 
TATGCGCGTT 
TATTCAGGCG 
TCGACGGCTA 
GGCagggcgg 
CGCCAAAGTC 
TCGACAAAGG 
CGGGCGGCTT 
GGGGCGGTAT 
TGTCGAAGCT 
ATCGCCAAAC 
GGCACAGCGT 
TTGCCGACCT 
ATGAATAATC 
GGGTTTCGCC 
CCGTTAACGA 
AGCGCGCTGC 
AGTTGCCGCC 



GTCCTCGCAG 
GGATGTGGAG 
GTTACAGCGA 
GGCGAACGGG 
GCAGAAAGGG 
AACGCACAAA 
TCGCTGAACG 
gatcgagcct 
acgtattgaa 
CtggcgcaAT 
GCAAGCCGCC 
TTCGCcgtgc 
GATGcggTGT 
TTTGCAGCGT 
TGACGTTGCG 
TTCGAGCAGA 
AATTATGAAT 
TGAACGTGCT 
GCGATATTGG 
CGCCGAAAAG 
cAATgacggc 
AGGCAGTGGT 
CGTGCTGGCG 
TGCGGCAGAT 
TTTACGGCAG 
GCCCGACAAA 
TTTCGGCGGC 
TCCATTATTT 
tgaAACcgcg 
TGGGCTACAG 
CTGCTTCAGA 
CAGCATAGGC 
CGTATCTGcg 
CATTTGGGCG 



CAGCCCTGCT 
CTGCCGAAGG 
GGAAGAAATC 
TCAACAGGGT 
CAGGCGGGAA 
ATCCCCCGAA 
CGTTTGAACA 
ataCcgggtg 
ggaagggGGa 
cggacgatGT 
GTGCagcagg 
GGcgttgaAG 
TCGGCGTACA 
TTGGCGAAGC 
TCTGACTGCA 
CAGACACCCA 
CTGGTTTCCC 
GTTGGAACAC 
CGGCAAACGG 
GCATACGGCA 
GGCGATGATA 
TGAAAAAAGT 
GCTGCGGCGG 
CGGCAGGGTG 
ACAATTTGTC 
CGGGAAGCCC 
GGGAAGCACG 
ACGaacAGTT 
CTCAAACTTA 
CCTGCTTTCC 
CGGCATACCA 
TGGGCGTATT 
gtattcgttt 
AAGTGTTGTG 



wo 99/24578 PCT/IB98/01665 

-83- 

1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
1751 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 

5 1 MLPARFTILS VLAAALLAGQ AYAAGA ADVE LPKEVGKVLR KHRRYSEEEI 

51 KNERARLAAV GERVNRVFTL LGGETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM lYQKWRQIEP IPGEAQKPAG WLRNVLKEGG 

151 NQHLDGLKEV LAQSDDVQKR RIFLLLVQAA VQQGGVAQKA SKAVRRAALK 

201 YEHLPEAAVA DAVFGVQGRE KEKAIEALQR LAKLDTEILP PTLMTLRLTA 

10 251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLRKPDDA YARLNVLLEH 

301 NPNANLYIQA AILAANRKEG ASVIDGYAEK AYGRGTGEQR GRAAMTTWiMI 

351 YADRRDYAKV RQWLKKVSAP EYLFDKGVLA AAAAAELDGG RAALRQIGRV 

401 RKLPEQQGRY FTADNLSKIQ MLALSKLPDK REALIGLNNI lAKLSAAGST 

451 EPLAEALAQR SIIYEQFGKR GKMIADLETA LKLTPDNAQI MNNLGYSLLS 

15 501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKGDAE SALPYLRYSF 

551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLRGDKK IWRETLKRYG 

601 lALPEPSRKP RK* 

0RF9ng and 0RF9-1 show 88.1% identity in 614 aa overlap: 

10 20 30 40 50 60 

20 orf 9-1 .pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

III M :|:ll:|:l:MI: III (:|:: II I It II : II :: M I I II II It I i I I I 
orf9ng-l MLPARFTILSVLAAALLAGQAYAAG— AADVELPKEVGKVLRKHRRYSEEEIKNERARLA 

10 20 30 40 50 

25 70 80 90 100 110 120 

orf 9-1 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
II II I II :: II I I I II I I I II t I I II II II I I I I I II II 1 I I I II I I I I I I I II I II I I I 
orf9ng-l AVGERVKRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
60 70 80 90 100 110 

30 

130 140 150 160 170 180 

orf 9-1 . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
I I I I I I I I t I II I II : I M 11111111:1 llllllll:|lltl:l: l:ll:llll:| 
or f 9ng- 1 EMI YQKWRQI EPI PGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQS DDVQKRRI FLLLVQ 

35 120 130 140 150 160 170 

190 200 210 220 230 240 

or f 9- 1 . pep AAVQQDGLAQKASBCAVRRAALKYEHLPEAAVADWFSVQGREECEKAIGALQRLAKLDTEI 
lltll |:|ltiiMlltlMlllltllltlll:||:tlllilllll IIIMIIIIIII 
40 orf9ng-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 

180 190 200 210 220 230 

250 260 270 280 290 300 

orf 9-1 . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSUIRLDDAYARLNVLL 
45 I I M II I I I It t I I II I I I I i I II I I t I I I I II I I I I I I I I I I I I t : : I I I I t It I I I t 

orf9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLbA^LL 
240 250 260 270 280 290 

310 320 330 340 ' 350 360 

50 orf 9-1 .pep ERNPNADLYIQAAILAANRKEGASVIDGYAEBCAYGRGTEEQRSRAALTAAMMYADRRDYA 

1 : I 1 1 I : I I I I I I I It I II I t II ) 1 I I t I I I I I II I I I I 1 I: 1 I I : i t I I : 1 I 1 I I I I I 
orf9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEBCAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

55 370 380 390 400 410 420 

orf 9-1 . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
t t 1 I t 1 i I 1 t I t I I I t I I 1 t 1 I 1 1 1 t : t t I I I I I t 1 1 1 1 t II I I I t I I 1 I i I 1 1 I 1 I 1 I t 
orf9ng-l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
360 370 380 390 400 410 

60 

430 440 450 460 470 480 

orf 9-1 . pep IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
t I 1 i I t I I II I I II i 1 t 1 : : 1 1 t I : : : I I I 1 I 1 : I I I : : t : : : 1 II 111:111 
orf9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 
65 420 430 440 450 460 470 



490 500 510 520 530 540 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



orf 9-1 . pep RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
I : : I : I t i I i I It I i I I I I I : I I I I I i t I t I M I I t t I I I I I I I M M M I I M I I I I I 
orf9ng-l TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
480 490 500 510 520 530 

550 560 570 580 590 600 

orf 9-1 . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLiCR 
I I M M M I I I I I I t t t I I I I M I I I I M I I I I M I I I I I I I i I I I I I I I I t I t I I I I I 
orf9ng-l AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 
540 550 560 570 580 590 

610 

orf 9-1 . pep HGIALPQPSRKPRKX 
: I I It I : I I M I I I I 
orf9ng-l YGIALPEPSRKPRKX 
600 610 

In addition, ORF9ng shows significant homology with a hypothetical protein fi-om P.aeruginosa: 

sp|P42810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HE^4M-HEMA INTERGENIC REGION 
(ORF3) 

>gi| 1072999|pir| IS49376 hypothetical protein 3 - Pseudomonas aeiruginosa >gi 1557259 
(X82071) orf 3 [Pseudomonas aeruginosa] Length =576 
Score = 128 bits (318), Expect = le-28 

Identities = 138/587 (23%), Positives = 228/587 (38%), Gaps = 125/587 (21%) 

Query: 67 VFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQAEMIYQKWR 126 

+++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A LA ++A W 
Sbjct: 53 LYSLLVAELAGQRNRFDIALSNYWQAQBCTRDPGVSERAFRIAEYLGADQEALDTSLLWA 112 

Query: 127 QIEPIPGEAQKPAG WLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRI 172 

+ P +AQ+ A ++ VL G+ H D L A++D + + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Query: 173 FXXXXXXXXXXXXXXXKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 232 

++ KY + + A+ Q ++A+ L+ + 

Sbjct: 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 

Query: 233 KLDTEILPPTLMTLRLTARK YPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKP 287 

E+PL+L + K P+GED + + + + LV + 
Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 

Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y++ + 

Sbjct: 271 DDAKAEFAGLVQQFPDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 

Query: 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 

LA +K+ A +D YA+ GG + T++ARDAR + P+ 

Sbjct: 331 RLAEEQKDTARALDEYAQ— VGPGNDFLPAQLRQTDVLLKAGRVDEAAQRLDKARSEQPD 388 

Query: 372 YLFDKXXXXXXXXXXXXXXXXXXRQIGRVRKLPEQQGRYFTADNLSKIQMLALSKLPDKR 431 

Y A L 1+ ALS + 

Sbjct: 389 Y AIQLYLIEAEALSNNDQQE 408 

Query: 432 EALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLETALKLTPDNAQIM 491 

+A + + + ELL RS++ E+ +M DL + PDNA + 

Sbjct; 409 KAWQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 462 



60 



65 



Query: 4 92 NNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 

N LGY+L + R E L+ A+++NPDD A+ DS+GW Y +G A YLR + + 
Sbjct: 4 63 NALGYTLADRTTRYGEARELILKAHKLNPDDPAILDSMGWINYRQGKLADAERYLRQALQ 522 

Query: 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

P+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 
Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 

gi 1 2983399 (AE000710) hypothetical protein [Aquifex aeolxcus] Length = 545 
Score =81.5 bits (198), Expect = le-14 

Identities - 61/198 (30%), Positives = 98/198 (48%), Gaps = 19/198 (9%) 



Query: 408 GRYFTADNL-SKIQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQ 459 

70 G Y A L K ++LA PDK+E L + +K + + L + 
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Sb j ct : 


335 


GNYEDAKRLIEKAKVLA PDKKEILFLEADYYSKTKQYDKALEILKKLEKDYPNDSR 


390 


Query: 


460 


RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS— DSKRLDEGFALLQ 


513 






+I+Y+ G L A++L P+N N LGYSLL +R++E L++ 




Sbjct: 


391 


VYFMEAIVYDNLGDIKNAEKALRKAIELDPENPDYYNYLGYSLLLWYGKERVEEAEELIK 


450 


Query: 


514 


TAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSF-ENDPEPEVAAHLGEVLWALGER 572 




A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 




Sbjct: 


451 


KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 


510 


Query: 


573 


DQAVDVWTQAAHLRGDKK 590 








++A + + +A L + K 




Sbjct: 


511 


EEARNYYERALKLLEEGK 528 





Based on this analysis, it is predicted that the proteins ftom Kmeningitidis and Kgonqrrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CG§CTGGGCG 

351 GCTGCCTGCC TATGCTGTTG C/VAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

451 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AA/\ACAACGC 

701 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORFl 1>: 

1 . .NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 
51 W AIIVLTIIV KAVLYPLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 
101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALE7V SVE LRQAPWLGWI 
151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IM PLVFSXXF 
201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEWS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 51>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAZVAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACrrCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 
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1001 ACTGGTTCGC CTCCCCGCTC TTCTGGCTCC TGAACCMCT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGCTCTATG GCGAAAATGC 

1151 GTGCCGCCGC ACCC7WVCTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AACAACAGGC GATGATGCAG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCCT ACTACATCCT 

1401 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACTTAT CTGAACCCGC 

1451 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCGTTGGTT 

1501 TTCTCCGTCA TGTTCTTCTT CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 AGTCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 52; ORFl 1-1>: 



1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFILFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDECVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDMSG KSEAEYIRKT PTGWL05IEH 

251 HFMSTWILQP KGRQSVCAAG ECNIDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 AEASINLYAG PQTTSVIANI ADNLQLAKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLTN ASYRSM AKMRAAAPKL QAIKEKYGDD 

401 RMAQQQAMMQ LYTDEKINPL GGCL PMLLQI PVFIGLYWAL FA SVELRQAP 

451 WLGWITDLSR ADPYYILPII MTU^TMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 FSVMFFFFPA GLVLYW WNN LLTIAQQWHI NRSIEKQRAQ GEWS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a 60kDa inner-membrane protein (accession P2S754) of Pseudomonas putida 
ORFl 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 

ORFll 2 LYAGPQTTSVIANIADNLQIJ^DYGKVHWFASPLFWLU^QLHNIIG1^GWAIIVLTIIVK 61 

LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 
60K 324 LYAGPKIQSKLECELSPGLELTVDYGFLWFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIK 383 

ORFll 62 AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 

+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 

60K 384 GLFFPLSAASYRSMARMRAVAPKLAALB(ERFGDDRQKMSQA^^MELYKKEKINPLGGCLPI 443 

ORFll 122 LLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPT 181 

L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 
60K 444 LVQMPVFLALYWVLLESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPP 503 



ORFll 182 DPMQAKMMKIMPLVXXXXXXXXPAGXVLYWWNNLLTIAQQWHINRSIE 230 

DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

60K 504 DPMQAKVMKMMPIIFTFFFLWFPAGLVLYWWNNCLSISQQWYITRRIE 552 



Homolojgy with a predicted ORF from Kmeningitidis (strain A) 

ORFl 1 shows 97.9% identity over a 240aa overlap with an ORF (ORFl la) from strain A of AT. 
meningitidis: 



10 20 30 

orf 11 . pep NLYAGPQTTSVIANIADNLQLAKDYGKVHW 

I I I I I M I I I I I i I i t i I I I I i I t i I I I i 
orf 11a IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 



40 50 60 70 80 90 

orf 11 . pep FASPLFWLLNQLHNIIGNWGWAIIVLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 
I I I I t t I I I i i i I I I I I I I I t I I I I t M I t I I ) I I I t I I I I I I i I t 1 I I 1 I I I i I I I I I I 
orf 11a FASPLFWLLNQIJlNIIGNWGWAIIVLTIIVKAVLyPLTNASYRSMAKMRAAAPKLQAIKE 
340 350 360 370 380 390 
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100 110 120 130 140 150 

orfll.pep KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLI^IPVFIGLYWALFASVELRQAPWtGWI 
I I I [ M I I M I I I i I I I I t I I I I i I I I I I I I t I i I I I 1 I I I I i I I I I I I I I I I I t I i I I I 
or f 1 la KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQI PVFIGLYWALFASVELRQAPWLGWI 

400 410 420 430 440 450 

160 170 180 190 200 210 

orf 11 . pep TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 
I t t I t I M I I I i i I I I i I I I If I I t t I I I I i I t I I I I I i I I I t I I I I M I till Ml 
orf 11a TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 
460 470 480 490 500 510 

220 230 240 

or f 11 . pep WWNNLLTIAQQWHINRSIEKQRAQGEWSX 
M : I I II i I I I I t I I I M II I M I I II t III 
orf 11a WVINNLLTIAQQWHINRSIEKQRAQGEWSX 
520 530 540 

The complete length ORFl la nucleotide sequence <SEQ ID 53> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ANGGATTTTA 
GATCGGATNG 
AACAGACGGC 
GCGCCCGNAN 
TGATGAAAAA 
CAACCGGCGA 
TACACCTACN 
TCTAAAAGGC 
GCGACAAAGT 
ATCGACAAAG 
CTTCGACATC 
ACCGCATCGT 
CACTCTTACG 
AGTCAGCTTC 
CCGAATACAT 
CACTTCATGT 
CGCCGCTGGC 
ACAGCACCAG 
TCCNAAGCCT 
CGCAAACATC 
ACTGGTTCGC 
ATCGGCAACT 
CGTACTGTAT 
GTGCCGCCGC 
CGTATGGCGC 
CAACCCGCTG 
TCGGATTGTA 
TGGCTGGGTT 
GCCCATCATT 
CGCCGACCGA 
NTNTCNNNNA 
GATCAACAAC 
TCGAAAAACA 



AAAGACTCAC 
NAAANGATGT 
ACAACAACAG 
CGCCGATTAC 
AGCGGCGACC 
CNAAAATT^ 
TCGCCCANTC 
ATCGGCTTTA 
TGAAGTCCGC 
TTTATACTTT 
GCCAACGGCA 
CCGCGACCAC 
TCGGCCCTGT 
TCCGACTTGG 
CCGCAAAACC 
CCACCTGGAT 
GACTGCNGTA 
CGTCAGCGTG 
CCATCAACCT 
GCCGACAACC 
CTCCCCCCTC 
GGGGCTGGGC 
CCATTGACCA 
GCCCAAACTG 
AGCAACAAGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
NGTTCTTCNN 
CTCCTGACCA 
ACGCGCCCAA 



NGNGTTTTTC 
TCCCCACTCC 
GCCGTAANCG 
CGTAACGACC 
TGCGCCGGCT 
CCGTTCATCC 
CGAACTTTTG 
GCGCACCGAA 
CTGAGCGCAC 
CACCAAAGGC 
GCGGTCAAAC 
AGCGAACCCG 
TGTTTATACC 
ACGACGATGC 
CNGACCGGCT 
CCTCCAACCC 
TNGACATCAA 
CCTTTAGCCG 
CTACGCCGGC 
TGCAACTGGN 
TTTTGGCTTT 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
CATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



GCCATCGCAC 
GAAGCCCGTC 
CTTCCGCCGA 
GACACGGTTC 
GACCCTGCTC 
TGTTTGGCGA 
GACGCGCAGG 
AAAACAGTAC 
CTGAAACACG 
AGCTATCTGG 
CGCCAACCTG 
AGGGTCAAGG 
CCTGAAGGCA 
CAANTCCGGN 
GGCTCGGCAT 
AAAGGCGGAC 
ACGCCGCAAC 
CTATCCAAAA 
CCACAGACCA 
CAAAGACTAC 
TGAACCAACT 
TTAACCATCA 
CCGTTCGATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCNT 
CCAAACCTAT 
TGAAAATCAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



TGGTGATTAT 
CCCGCGCCCC 
AGCCGCGCTC 
AAGCCGTCAT 
AAATACAAAG 
CGGCAAANAA 
GCAACAACAT 
AGCTTGGAAG 
CGGTCTGAAA 
TCAACGTCCG 
AGCGCGGACT 
CTACTTTACC 
ACTTCCAAAA 
AAATCCGAGG 
GATTGAACAC 
AAAGCGTTTG 
GACAAGCTGT 
CGGTGCGAAA 
CATCNGTTAT 
GGCAAAGTAC 
GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCTTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This encodes a protein having amino acid sequence <SEQ ID 54>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 



APXXPITVTT 
YTYXAXSELL 
IDKVYTFTKG 
HSYVGPWYT 
HFMSTWILQP 
SXASINLYAG 
IGNWGWAIIV 



DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGGQSVCAAG 
PQTTSVITVNI 
LTIIVKAVLY 



RMAQQQAMMQ 
WLGWITDLSR 
XSXXFFXFPA 



LYTDEKINPL 
ADPYYILPII 
GLVLYWVINN 



SGDLRRLTLL 
IGFSAPKKQY 
ANGSGCTTANL 
SDLDDDAXSG 
DCXXDIKRRN 
ADNLQLXKDY 
PLTNASYRSM 
GGCLPMLLQI 



KYKATGDXNK 
SLEGDKVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSTSVSV 
GKVHWFASPL 
AKMRAAAPKL 
PVFIGLYWAL 



t4AATMFAQTY 
LLTIAQQWHI 



LNPPPTDPMQ 
NRSIEKQRAQ 



PFILFGDGKX 
LSAPETRGLK 
SEPEGQGYFT 
XTGWLGMIEH 
PLAAIQNGAK 
FWLLNQLHNI 
QAIKEKYGDD 
FASVELRQAP 
AKMMKIMPLV 
GEWS* 



ORFlla and ORFl 1-1 show 95.2% identity in 544 aa overlap: 



10 



20 



30 



40 



50 



60 
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or f lla . pep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 
liltll tillllllllt lliliilllllll:ltlllt:ltlllllll :IIIM) 
orfll-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf lla . pep DTVQAVIDEKSGDLRRLTLLKYKATGDXNKPFILFGDGKXYTYXAXSELLDAQGNNILKG 
I I t I I I I I I I M I It I I I t I I I I I t M I I I I I I I I t I I Ml I I I II II II i I I I II 
orfll-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 1 la . pep IGFSAPKKQYSLEGDBCVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
llltllllllllllllllllllllllltllllMIIIMtlllllllllMIIIIIIIII 
15 orfll-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 



20 



190 200 210 220 230 240 

orf lla . pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 
I M I I I I I I I I I t I I I t 11 I I I t I I I I I I I t I I I II I 11 II I I I I I I I I I I H I I I I I I 
orfll-1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 



25 



30 



35 



40 



250 260 270 280 290 300 

orf lla . pep XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRRNDKLYSTSVSVPLAAIQNGAK 
II lit I Ml I III 11 I IIMI 1111111:1 MIIIIIIIMIIIMIIIIIIIIM 
orfll-1 PTGWLC^IEHHFMSTWILQPKGRQSVCAAGECNIDIKJIRNDKLYSTSVSVPLAAIQNGAK 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf lla . pep SXASINLYAGPQTTSVIANIADNLQLXKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 
: M II I M M t I I II 11 I I I I I I I I I I M I I M I I II t I I I II I t II I I II I M t I M 
orfll-1 AEAS INLYAGPQTTS VIAN lADNLQLAKD YGKVHWFAS PLFWLLNQLHNI IGNWGWAI IV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf lla . pep LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEECYGDDRMAQQQAMMQLYTDEKINPL 
I I I It II II M M i II II I II M I M I II II I t M M M I I I It I M I 11 I I I I II II M 
orfll-1 LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

370 380 390 400 410 420 



45 



430 440 450 460 470 480 

orf lla . pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 
t t I I I I I I 1 It 1 M I I t I I It 1 I I I It 1 I It 1 i 1 I II M 1 t I 1 It II M t I I I I II It It 
orfll-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

430 440 450 460 470 480 



50 



490 500 510 520 530 540 

orf lla . pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLTIAQQWHINRSIEKQRAQ 
11 I I It I 1 It 1 I M It I 1 tl I It II It i I II II : t I M 1 I 1 t t I II 1 It I I It I It 
orfll-1 LNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQ 

490 500 510 520 530 540 



55 or f 1 la . pep GEWSX 

I I I 1 1 I 

orfll-1 GEWSX 



60 Homology with a predicted ORF from N. gonorrhoeae 

ORFl 1 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORFl l.ng) from N. 
gonorrhoeae: 



Orfll 
65 orfllng 



NLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLT 
M 1 1 It I t II I I I M 1 I I I M I I M I It It I t 1 It I t It I It I 1 1 1 1 It M MM It 
MAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIWLT 



57 
60 
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orfll IIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPLGG 117 

llllllltMltllllIltllllMI:||:|IIIMIIIIlllllllll: il:llllil 

orfllng IIVKAVLYPLTNASYRSMAKMRAAAPELQTIECEKYGDDRMAQQQAMMQLFEDEEINPLGG 120 

orfll CLPMLLQI P VFIGL YWALFASVELRQAPWLGWI TDLSRADP YYILPI IMAATMFAQTYLN 177 

IIItlllltlillllDllllllillllllllllllttlttllllllMllillltlill 

orfllng CLPMLLQI PVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 

orfll PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWHINRSIEKQRAQGE 237 

lilllllllllllillllll illMIt IIIIIIIMIIIMItllllttllltllll 

orfllng PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGE 240 



orfll WS 240 

I I I 



orfllng WS 243 

An ORFl Ing nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 



1 MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWGW AIWLT IIVKAVLYPL TN ASYRSMAK MRAAAPELQT IKEKYGDDRM 

101 AQQQAMMQLF EDEEINPLGG CLP MLLQIPV FIGLYWALFA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMP LVFS 

201 VMFFFFPAGL VLYW WNNLL TIAQQWHINR SIEKQRAQGE WS* 

Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 



1 ATGGATTTTA A7\AGACTCAC 

51 GATCGGCTGG GAAAAAATGT 

101 AACAGGCGGC ACAAAAACAG 

151 GCGCCCGCAA CGCCGATTAC 

201 TGATGAAAAA AGTGGCGACC 

251 CAACCGGCGA CGAAAACAAA 

301 TACACCTACG TCGCCCAATC 

351 TCTGAAAGGC ATCGGCTTTA 

401 GCGACACAGT CGAAGTCCGC 

451 ATCGACAAAG TCTATACCTT 

501 CTTCGACATC GCCAACGGCA 

551 ACCGCATCGT CCGCGACCAC 

601 CACTCTTACG TCGGCCCTGT 

651 AGTCAGCTTC TCCgacTTgg 

701 ccgaatacaT CCGCAAAACC 

751 cacttcatgt ccacctggat 

801 cgcccaggga gactgccgta 

851 acagcgcaag cgtcagcgtg 

901 aaaccgaaaa tggcggTCAA 

951 TATCGCAAAC ATCGCcgacA 

1001 TACACTGGTT CGCATCGCCG 

1051 ATTATCGGCA ACTGGGGCTG 

1101 AGCCGTACTG TATCCATTGA 

1151 TGCGTGccgc cgcacCcaaA 

1201 GACCGTATGG CGCAACAGCA 

1251 AATCAACCCG CTGGGCGGCT 

1301 TCATCGGCTT GTACTGGGCA 

1351 CCTTGGCTGG GCTGGATTAC 

1401 CCTGCCCATC ATTATGGCX3G 

1451 CGCCGCCGAC CGACCCGATG 

1501 GTTTTCTCCG TCATGTTCTT 

1551 GGTGGTCAAC AACCTCCTGA 

1601 GCATCGAAAA ACAACGCGCC 

This encodes a protein having amino acic 



GGCGTTTTTC GCCATCGCGC TGGTGATTAT 
TCCCCACCCC GAAACCCGTC CCCGCGCCCC 
GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 
CGTAACGACC GACACGGTTC AAGCCGTTAT 
TGCGCCGGCT GACCCTGCTC AAATACAAAG 
CCGTTCGTCC TGTTTGGCGA CGGCAAAGAA 
CGAACTTTTG GACGCGCAGG GCAACAACAT 
GCGCACCGAA AAAACAGTAC ACCCTCAACG 
CTGAGCGCGC CCGAAACCAA CGGACTGAAA 
TACCAAAGAC AGCTATCTGG TCAACGTCCG 
GCGGTCAAAC CGCCAACCTG AGCGCGGACT 
AGCGAACCCG AGGGTCAAGG CTACTTTACC 
TGTTTATACC CCTGAAGGCA ACTTCCAAAA 
acgACGATGC gaaaTccggc aaATccgagg 
ccgaccggtt ggctcggcat gattgaacac 
cctccAAcct aaaggcggcc aaaacgtttg 
tcgacattaa aCgccgcaac gacaagctgt 
cctttaaccg ctatcccaac ccgggggcca 
CCTGTATGCC GGTCCGCAAA CCACATCCGT 
ACCTGCAACT GGCAAAAGAC TACGGTAAAG 
CTCTTCTGGC TCCTGAACCA ACTGCACAAC 
GGCAATCGTC GTTTTGACCA TCATCGTCAA 
CCAACGcctc ctACCGTTCG ATGGCGAAAA 
CTGCAGACCA TCATUVGAAAA ATAcgGCGAC 
AGCGATGATG CAGCTTTACA AAgacgAGAA 
GTctgcctat gctgttgCAA ATCCCCGTCT 
TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 
CGACCTCAGC CGCGCCGACC CCTACTACAT 
CAACGATGTT CGCCCAAACC TATCTGAACC 
CAGGCGAAAA TGATGAAAAT CATGCCGTTG 
CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 
CCATCGCCCA GCAGTGGCAC ATCAACCGCA 
CAAGGCGAAG TCGTTTCCTA A 

sequence <SEQ ID 58; ORFl lng-l>: 



1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKMAVNLYA GPQTTSVIAN lADNLQLAKD YGKVHWFASP LFWLLNQLHN 
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351 IIGNWGW AIV VLTIIVKAVL YPLTN ASYRS MAKMRAAAPK LOTIKEKYGD 

401 DRMAQQQAMM QLYKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 

451 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

501 VFSVMFFFFP AGLVLYW WN NLLTIAQQWH INRSIEKQRA QGEWS* 

5 ORFl lng-1 and ORFl 1-1 shown 95.1% identity in 546 aa overlap: 

10 20 30 40 50 60 

MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQKQAATASAEAALAPATPITVTT 
t I I I I I It I M t I I I I I I I t t I I I I I I I I I I It I I I I I : I I : t I i I I I I I I I i I M I I I t 
MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 
10 20 30 40 50 60 

70 80 90 100 110 120 

DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 
t t t t I t t t I II t I I t t t t I I I t t t I I II t I t I: I t t t I I I I t II t I t t II II t II I I t t I 
DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
70 80 90 100 110 120 



10 



orfllng-l.pep 
orfll-1 



orf llng-1 .pep 
15 orfll-1 



130 140 150 160 170 180 

orfllng-l.pep IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 
20 1111111111:1:11 I I t II t II I I I t I 1 1 t t I I t I 1 I 1 t I II I t t t 1 I I t t t 1 t I 1 

orfll-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 



190 200 210 220 230 240 

25 orfllng-l.pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

1 I I 1 It t I 1 I I I I I I 1 I i I I t I I I t I I I 1 I I 1 I I I I I II I t t I I 1 t I I I I I I I I I I i 1 i I 
orfll-1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 



30 



35 



40 



250 260 270 280 290 300 

orfllng-l.pep PTGWLGMIEHHFMSTWILQPKGGQNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGP 

t I I 1 I 1 t I I I I II i I 1 t I I I I I t:tll 1:1 tllllltltttl:|tlttt:|t : I 
orfll-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 

250 260 270 280 290 

310 320 330 340 350 360 

or f 1 Ing- 1 . pep KPKMAVNLYAGPQTTSVIANI ADNLQLAKDYGKVHWFASPLFWLLNQLHNI IGNWGWAI V 
1 : ::tlttlltMllttllllltitllMlltlttllMtllttttttilllltltl: 
orfll-1 KAEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAII 
300 310 320 330 340 350 



370 380 390 400 410 420 

orfllng-l.pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 
I I I II I I t I 1 I t I It I 1 I I 1 1 1 I I I I It 1 I t I: I 1 I 1 I t t t I I I I 1 1 I t I I t 1 I 1 I I 1 1 
45 orfll-1 VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 

360 370 380 390 400 410 

430 440 450 460 470 480 

orf 1 Ing- 1 . pep LGGCLPMLLQI PVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 
50 I I I 1 I II I 1 I t f I I I I I 1 1 I t 1 I I I I I 1 I 1 I I I I t 1 I I 1 I I I I t 1 i t I 1 I M I I I t i I t t 

orfll-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQT 
420 430 440 450 460 470 



490 500 510 520 530 540 

55 orf llng-1 .pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 

I I 1 I I 1 t t I t 1 I I M 1 1 ) I 1 1 I I I 1 1 1 t t 1 M I 1 I I M I 1 1 1 I i t I I I 1 I 1 I M 1 I I I t I 
orfll-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 
480 490 500 510 520 530 

60 

orf llng-1 . pep QGEWSX 
I I I I I I I 

orfli-1 QGEWSX 
540 

65 In addition, ORFl lng-1 shows significant homology with an inner-membrane protein fix»m the 



database (accession number p25754): 
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ID 
AC 
DT 
DT 
DT 
DE 



60IM_PSEPU 

P25754; 

Ol-MAY-1992 

Ol-MAY-1992 

Ol-NOV-1995 



STANDARD; 



PRT; 



560 AA. 



(REL. 22, CREATED) 

(REL. 22, LAST SEQUENCE UPDATE) 

(REL. 32, LAST ANNOTATION UPDATE) 



60 KD INNER-MEMBRANE PROTEIN. 



10 



15 



SCORES Initl: 1074 Initn: 1293 Opt: 1103 

Smith-Waterman score: 1406; 41.5% identity in 574 aa overlap 

10 20 30 40 

orf llng-1 . pep MDFKR LTAFFAIALVIMIGW EKMFPT PKPVPAPQQAAQKQ 

11:11 ::|: ::: I::: I : :|| I III :::|: : 

p25754 MDIKRTILIAALAWSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 

10 20 30 40 50 60 



20 



50 60 70 80 90 

orf llng-1 . pep AATASAEAALAPATPIT VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

: : I : II : : I : I : : I II : : : : II : I I : : I : I | I I : I I I 

p25754 VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 

70 80 90 100 110 120 



25 



100 110 120 130 140 

orf llng-1 . pep VLFGDGKEYTYVAQSELLDAQGNNILKGIG FSAPKKQYTL-NGD TVEVRLSAPE 

11:11:1:111) ::l : :: I ::| :t:| I :|: :|::::| 

p25754 QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQIADGQEQLWDLKFS 

130 140 150 160 170 



30 



35 



40 



45 



orf llng-1 .pep 
p25754 

orf llng-1. pep 
p25754 



orf llng-1 . pep 
p25754 



150 160 170 180 190 200 

TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 

t I : : I : : I : I : I I : I i I I I : I : : : I I I : I : : I : I 
DNGVNYIKRFS FKRGEYDLNVS YLI DNQSGQAWNGNMFAQLKRDASGDPSSSTATGTATY 

180 190 200 210 220 230 

210 220 230 240 250 260 

VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 
: I : : : I : : I II : : I : I I : : : I : : M : : : : I : I : : : I I I : 

LG;\ALWTASEPYKKVSMKDID KGSLKE NVSGGWVAWLQHYFVTAWI-PAKSD 

240 250 260 270 280 

270 280 290 300 310 320 

QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVNLYAGPQTTSVIANIAD 
:|| :::::: I : : I : : : 1 : I I : : : | | | || : | : ::: 

NNV VQTRKDSQGNYIIGYTGPVISVPA-GGKVETSALLYAGPKIQSKLKELSP 

290 300 310 320 330 



50 



330 340 350 360 370 380 

orf llng-1 . pep NLQLAKDYGKVHWF-ASPLFWLLNQLHNIIGNWGWAIWLTIIVKAVLYPLTNASYRSMA 
: I : I : Ml : t I I : I : t I I I : : : I : : : I I II i : I : I I I : : : I : : : : t I : i I II I I I 
p25754 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
340 350 360 370 380 390 



55 



390 400 410 420 430 440 

orf llng-1 . pep KMRAAAPKLQTIKEKYGDDRMACXJQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 
:ll|:||lt ::tl::||||: ::MM:|II I I t I I I I I I t I : I : i : I I I : : I I I : I : 
p25754 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVO^PVFLALYWVLL 
400 410 420 430 440 450 



60 



65 



450 460 470 480 490 500 

orf llng-1 . pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 
111:11111: t I I I II t I : : I I II I I : I I t t I III I t I I I I I : I I : I I : : I 
p25754 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 

510 520 530 540 

orf llng-1 . pep SVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGEWSX 

: :|::|ltlllilltlll l:l:lll:|:l II 
p25754 TFFFLWFPAGLVLYWWNNCLSISQQWYITRRIEAATKKAAA 
520 530 540 550 560 
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Based on this analysis, including the homology to an inner-membrane protein from P, putida and 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
is predicted that the proteins from Kmeningitidis and KgonorrhoeaCy and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 8 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 59>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTG7WVCGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

251 ACCGTTACGA AGTT.TTTAT CGCGGTACG. ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ED 60; 0RF13>: 

1 , . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61>: 

• 1 . .GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

251 ACCGTTACGA AGTTTTtTAT CGCGGTACGC ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; 0RF13-1>: 

1 .. AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVFY RGT/iWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeninsitidis (strain A) 

0RF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A of N, 
meningitidis: 

10 20 30 40 50 

orfl3.pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXA LLSALGIXF 

i I ) I I I i I I I I t I I I t I I I I I i I I I I I I I I I i I I I I I I I I I I I I I I M I 
orfl3a MTVWFVAAVAVLIIELLTGTVYLLWSjUVLAGSGIAYGLTGSTPAAVLTAA LLSALGIWF 
10 20 30 40 50 60 



60 70 80 90 100 110 

orf 13 . pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 
it t 1 I t I M I I I I I I I I I I I I 1 : I n [ I : I I I I I I I I I t i I t I I I t I I I I I I I I I I I 
orf 13a VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
70 80 90 100 110 120 



orf 13. pep 



120 

LIVRKEGNLLIITHPX 
I t I I I I M M t I : : M 
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orfl3a LIVRICEGNLLIIAKPX 
130 

The complete length 0RF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA i^AACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

401 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 64>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

0RF13a and 0RF13-1 show 94.4% identity in 126 aa overlap 

10 20 30 40 50 60 

orf 13a . pep MTVWFVAAVAVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

I M It t I I I i t I I I I I I I I I I 1 I I I I I I I I I i I I I I t I I ! I I I I I I t I I 
orf 13-1 AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

10 20 30 40 50 

70 80 90 100 110 120 

orf 13a . pep VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
i I n i I I I I I I I ) I I I I I I I I I : I I I I I : I I M I ! I I I I t I I t I I I I I I I I I I t I I I I I 
or f 1 3 - 1 VHAKTAVRKVETDS YQDLDAGQYVE I LRHTGGNRYEVFYRGTHWQAQNTGQEELE PGTRA 

60 70 80 90 100 110 

130 

orf 13a. pep LIVRKEGNLLIIAKPX 
lllltllltll|::|l 
orf 13-1 LIVRKEGNLLIITHPX 
120 

Homology with a predicted ORF from N.gonorrhoeae 

0RF13 shows 89.7% identity over a 126aa overly with a predicted ORF (ORF13.ng) from K 
gonorrhoeae: 

o r f 1 3 AVLI lELLTGTVYLLWSAALAGSGIAYGLTGST PAAVLTXALLSALG IXF 5 1 

I I I I I I I M I I I I I I 11 I I I I M I t I t I f I I I I i I I I I I I I M I I t I I I 
orf 13ng MTVWFVAAVAVLI lELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 60 

or f 1 3 VHAKTAVRKVETDS YQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 111 

I I I I I I I M I I t I I I 1 i i : I : I : I [ I I : I 1! I I I I I I I I I I I t t I I I It : I I I M I 
orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 120 

orfl3 LIVRKEGNLLIITHP 126 

I I I I I t t I I I t I : : 1 
orfl3ng LIVRKEGNLLIIANP 135 

The complete length 0RF13ng nucleotide sequence <SEQ ED 65> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCCTA CGGGCTGACT GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCACTGCTTT CCGCGCTGGG CATTTGGTTC GTACATGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATACC GGAAAATATG 

251 CCGAAATCCT CCGATACACA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCGCA AAATACGGGG CAGGAAGTGT TTGAACCGGG 

351 AACGCGCGCC CTCATCGTCC GCAAAGAAGG TAACCTTCTT ATCATCGCAA 

401 ACCCTTAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSARL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAEILRYT GGNRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

0RF13ng shows 91.3% identity in 126 aa overlap with 0RF13-1 : 

10 20 30 40 50 

orf 13-1. pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

I I I 1 I I I I I I I I I I I I I I I t i I I I I I I I I I M [ I I I I I M I I I I I I t I I 
orfl3ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 13-1. pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
i I I t I I I I I ) t I I I I I t I : I : I : I I I I : I I t I I I I I t I M I M I I I I I I I I : I I I M I 
orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 

70 80 90 100 110 120 

120 

or f 1 3 - 1 . pep LI VRKEGNLLI ITHPX 
I I I I I t I i I t M : : I I 
orfl3ng LXVRKEGNLLIIANPX 

130 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
0RF13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from N.meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N.meningitidis <SEQ ID 67>: 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGJ\ATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

401 ATGCCGTC. . 

This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 

1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 
51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
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551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 
601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; 0RF2-1>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKPIRAK PKLRVRKS* 

Further work identified the corresponding gene in strain A oiKmeningitidis <SEQ ID 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCTW^T AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This encodes a protein having amino acid sequence <SEQ ID 72; 0RF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118aa 
overlap with 0RF2a: 

10 20 30 40 50 60 

orf 2 . pep MXD FGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I I I 1 I I t I I I t I I t I I i I I I I I I i I I I I : i I I I I i i i i 1 I I I I t I I I I I I I I I I I t I i 
orf 2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 2 . pep BCAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

I I I i I I I I t I M I n I I I I I I t [ I i I [ I I M I I I I I I I I I t M I 1 I I I I I t M I I I I I 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 

70 80 90 100 110 120 

130 

orf 2 . pep RCGKHPIRRHFRRYAV 

orf 2a DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 
130 140 150 160 170 180 

The complete strain B sequence (0RF2-1) and 0RF2a show 98.2% identity in 228 aa overlap: 

orf 2a . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

t I i I t I I t I I M I I t I i i t I I M I I I I I I I I I I I I I it I M I I i I I I I I I I I I I I I t t I I 
orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

orf 2a . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I I I I I I M I 1 I I 1 I I I i I I I t I i t I M I I I I I I i t I I M I n I i I I I t I I I I I t M I t : I 
orf 2-1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

orf 2a . pep DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

I i I I I I I I I M I I I t I I I I I I I I j I I I I I t I I I ) I I I I I I I ) I I I I I I M I I I I I I I I I 
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orf2-l DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

orf 2a . pep QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 229 

I I I I I I I I I I I M [ I I I I I I I I I i I I I I t I I t : I I I I I I I I I I I I i I I 
orf2-l QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 229 

Further work identified a partial DNA sequence <SEQ ID 73> in Kgonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MFD FGLGELI EVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGTTTGATT 
GATTGTCCTT 
GGCTTATCGG 
GACACTCAAA 
AGCTGCCGCC 
TGCAGAACAG 
CTGCCCGAAC 
tccccttccc 
TGCCGTCTGA 
CAAACCGGCA 
GCGGGAATAC 
tcgaagtcag 
acttccctgc 
ACACCGCGCc 



TCGGTTTGGG 
GGTCCAGMC 
CAGGCTGCAA 
TCGAACTGGA 
GCTCAGGTTC 
TCTGCACGAC 
AGCGCACGCc 
gATACGGCAA 
ACGTTCCGAT 
GTACAGCCGA 
CTGactgctt 
ctaTATCGAT 
gcaAACAGGC 
aAACCGAAat 



CGAGCTGATT 
GCCTGCCCGA 
CGCTTTGTAG 
AGAGCTGAGG 
GAGACAGCCT 
ATTTCCGACG 
tgccgatttc 
ACACCGTATC 
ACTtccgcCG 
ACCTGCGGAA 
ctgccgccgc 
ACTGCTGTTG 
AATAAACCGC 
tgcgcgtcCG 



TTTGTCGGCA 
AGCCGCCCGC 
GAAGCGTCM 
AAGGTCAAGC 
CAAAGAAACC 
GTCTGAAGCC 
gGTGTCGATg 
AGACGGCATT 
AAACCCTTGG 
ACCGACAAAG 
acctgtcgta 
AAacgcctgT 
AAACGCGATT 
TAAATCATAA 



TTATCGCCCT 
ACTGCCGGAC 
ACAAGAACTT 
AGGCATTCGA 
GATACGGATA 
TTGGGAAAAA 
AAAacggcaa 
TCCGACGTTA 
GGACGACAGG 
ACCGCGCATG 
Cagagggccg 
tccgcaCacc 
TttgtccgaA 



This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS* 

The originally-identified partial strain B sequence (0RF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

orf 2 . pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

t I I I I I M : I I I t I t i I I I I I I I I t I I I : I I i I I I I [ I M I [ 1 I I I I : I i I I I I I M t 
orf2ng MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf 2 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

1:11 I I i I I I I I I I I I I I I I t I : : : I I I I I t [ I I I I t I M I I I I I I I I I n I : t I 
orf2ng KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGKSLP 120 

orf 2. pep RCGKHPIRRHFRRYAV 136 

I til II II II il II 
orf2ng RYGKHRIRRHFRRYAV 136 

The complete strain B and gonococcal sequences (0RF2-1 & 0RF2ng-l) show 91.7% identity in 
229 aa overlap: 

orf 2-1. pep 
orf2ng-l 

orf 2-1. pep 
orf2ng-l 



10 20 30 40 50 60 

MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I I t t II II i : I M II I I I I I I 1 I II i I I II I I I t II M I I I I I I I I II I : II n I I II II 
MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
I : t I I M I II I I I It I I I t t It ::: I I I I M I I II M I II I I II I II I I It II I II I t 
KVKQAFEAAAAQVRD3 LKET DTDMQN SLHD I SDGLKPWEKL PEQRT PADFGVDENGN PLP 
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70 80 90 100 110 120 

130 140 150 160 170 180 

orf 2-1 . pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 
5 |:ill:|||IMilltlll llllllllliihiltltlltllillllll 

orf2ng-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 

130 140 150 160 170 180 

190 200 210 220 229 

10 orf 2-1 .pep Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 

I : I I I I I i I I I I I I I I I I I M I t I I I I : I t It I I I I I I I I I I M I I I I 
orf2ng-l QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 

190 200 210 220 230 

Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
15 and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
of E.coli: 



20 



gnl I PID 1 61292181 (AJ005830) TatB protein [Escherichia coli] Length = 171 
Score = 56.6 bits (134), Expect = le-07 

Identities = 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 

Query: 1 MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
Sbjct: 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 



25 Query: 61 -KVKQAFEAAAAQVRDSLBCETDTDMQNS 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DSLBCKVEKASLTNLTPELKASMDELRQA 88 

Based on this analysis, it was predicted that 0RF2, 0RF2a and ORF2ng are Ukely to be membrane 
proteins and so the proteins from Kmeningitidis and N.gonorrhoeaey and their epitopes, could be 
30 usefiil antigens for vaccines or diagnostics, or for raising antibodies. 

0RF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E,colU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3A 
shows the results of affinity purification of the GST-fiision protein, and Figure 3B shows the results 
of expression of the His-fiision in Exolu Purified GST-fiision protein was used to iromunise mice, 
35 whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a usefiil immunogen. 

Example 10 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 77>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC.TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

45 251 ATTGATGCAC kOrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

401 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 

551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 

601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG. . 

This corresponds to the amino acid sequence <SEQ ID 78; 0RF15>: 

5 1 MQARLLIPIL FSVFILSACG TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201. IDVFGTIRNR TEM. . 

10 Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

15 201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

20 451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGA7VACA CTGAAAGCCC T^CAAAACT GGAATATTTC GCAGTAGACA 

25 701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

30 951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; 0RF15-1>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAAST^RAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

35 151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ED 81>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

45 251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACTiA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

50 501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

55 751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

60 This encodes a protein having amino acid sequence <SEQ ID 82; 0RF15a>: 



1 MQARLLIPIL FSVFILSACG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 
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51 DMDLQALHGR 

101 DYTYPRYETT 

151 IGGMGDYRNE 

201 IDVFGTIRNR 

251 AYKENYALWM 

301 SHEGYGYSDE 



KVALYIATMG 
AETTSGGLTG 
TLTTNPRDTA 
TEMHLYNAET 
GPYKVSKGIK 
AVRRHRQGQP 



DQGSGSLTGG 
LTTSLSTLNA 
FLSHLVQTVF 
LKAQTKLEYF 
PTEGLMVDFS 



RYSIDALIRG 
PALSRTQSDG 
FLRGIDWSP 
AVDRTNKKLL 
DIQPYGNHMG 



EYINSPAVRT 
SGSKSSLGLN 
ANADTDVFIN 
IKPKTNAFEA 
NSAPSVEADN 



The originally-identified partial strain B sequence (0RF15) shows 98.1% identity over a 213aa 
overly with ORFlSa: 



10 



15 



20 



25 



10 20 30 40 50 60 

Orf 15 . pep MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALH6R 
I I I I I i I I I i I j i I I I t I I I I I I I I I I I I I I I I i I I t I I t I t I I I I I I I I I t 1 I I I I I I 
orf 15a MQARLLIPILFSVFILSA CGTLTGIPSHGGGtCRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 15 . pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I M I I I I M It I I i I I I I i I I I I I ] I I I I t I I I I I t I M I i [ i I I M I M I t t I 
orf 15a KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I t I I I I M I I [ I I I I I t t I I I I I I ) I I I I I I I I I I t t I I I I I I I t I I I I t I I I I i i I 
orf 15a LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 



30 



190 200 210 

orf 15 . pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEM 
i i I I i t I It M I I I I I t I I I I I t I I I I I i I I I I 
orf 15a FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and 0RF15a show 98.8% identity in 320 aa overlap: 



35 



10 20 30 40 50 60 

orf 15a . pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I i I t II I II I I I I I I I It It I I I I I I M II I I I t It I I t I I II t II I I I I t I I I II 
orf 15-1 MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



40 



45 



50 



55 



70 80 90 100 110 120 

orf 15a . pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I t I t II t t I I t t t I t t 1 t I I I t I I II t I t t I I i I I I t I I I I I t t i I t I t I I I t t t I I I I t 
orf 15-1 KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15a . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I It I I II I I I t I I I I I I I t I I I I 1 I I 1 t I I I I I I I 1 I I I I I I I I I It I I I t t I t I I I I 
orf 15-1 LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15a. pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
I I I 1 I I I I I I I I I 1 I I II I I II i I I I I I I I I I M I I I I I I I I I I I t I I 1 t II I t I t I M I 
orf 15-1 FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 



60 



250 260 270 280 290 300 

orf 15a . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 
I I I I t t I I I I I I I I I I I t I I I I I I I I I t I I I I I I I 1 I I M I I : I I t II I I I I I I t I I I I 
or f 1 5 - 1 IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 



65 



orf 15a .pep 



orfl5-l 



310 320 
SHEGYGYSDEAVRRHRQGQPX 
t I I I I I I I I I : I I : I I I I I I I 
SHEGYGYSDEWRQHRQGQPX 
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310 320 

Further work identified the corresponding gene in N.gonorrhoeae <SEQ ID 83>: 

1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGCAAACGCT 

101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA TVAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTTUVTGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 



1 MRARLLIPIL FSV FILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENY7VLWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (0RF15) shows 97.2% identity over a 213aa 



overlap with ORFlSng; 

orf 15 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 60 

I : I I I I t I I t I I I I I I M I I I I I I i I I I I I I I I I I I t I I I I t I I I t I i t ) I I I I I 1 I I I 
orflSng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 60 



orf 15. pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

I I I t I I I I t I t I I 1 I t t I I I t I I I I t t I I M t t I I I I I I I I I I 1 i I I 1! I I I I I I I I 

orflSng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

orf 15 .pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : t n I I I I I H I I I I it I I I I I I I I I t I I I I I I I it I 

orflSng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

orf 15. pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEM 213 

I I I t I t I I 11 i I t I I t I t t I i t I t I i t I t i II I 

orfl5ng FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 240 

The complete strain B sequence (0RF15-1) and ORFlSng show 98.8% identity in 320 aa overlap: 



10 20 30 40 50 60 

orf 15-1 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I : I I I I I I I M I I I t II t i II I I i I I I i I I II I I i t II i I I I t i li II I I I I I II I I I I I 
orflSng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15-1 . pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
lilllillltltilltllillllllltlllllllttllitllltlttltltlilllltit 
orflSng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15-1 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
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10 



I I I I I t I M I I I t I I I I I I I I I 1 : I I I M I I I t I I I t I I I I I I ) I I I I I I I I I I I I I I I I 
orflSng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15-1 . pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
lltllllllllllltitllllMllltlllttlllllllltllltllMtllllllllll 
orfl5ng FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 15-1 . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 
I t I I I i I I t I I I I I I I I I I I I M I I I I I t I I I I I I I I I I I I I : I I t I I I I I I 1 I I I I I I I 
orfl5ng IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 
15 250 260 270 280 290 300 

310 320 
orf 15-1 . pep SHEGYGYSDEWRQHRQGQPX 
i I i i I I I I I i: I I i 1 I I I I t I 
20 orfl5ng SHEGYGYSDEAVRQHRQGQPX 

310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins ftom Kmeningitidis and 
25 N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF 15-1 (31.7kDa) v/as cloned in pET and pGex vectors and expressed in E.coli^ as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fiision protein, and Figure 4B shows the 
30 results of expression of the His-fusion in ExolL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Westem blot (Figure 4C) and ELISA (positive result). These 
experiments confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immimogen. 

Example 11 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 85>: 

35 1 ..GG.CAGCACA AAAAACAGGC GCTT^AACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

40 251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

451 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

45 501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC.TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; 0RF17>: 

1 . .GQHECKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 
51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 
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101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGEXYLPAV 
151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCTUVCACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTrrTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 Tc.TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; 0RF17-1>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMIFGVFTGA 

101 LSAKYI PAFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP lALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 XFGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HAnfluenzae transmembrane protein HI0902 (accession number P44070) 
0RF17 and HI0902 proteins show 28% aa identity in 192 aa overlap: 

HKKQAVNGKTVFTMMPGMIFGVFT-GAFSAKYIPAFGLQIF— FILFLTAVAFKTLHTDP 59 
HK + + V + P ++ VF G F + +IF +++L ++ D 

HKLGNIVWQAVRIIAPVIMLSVFICGLFIGRLDREISAKIFACLVVYLATia^VLSIKKD- 130 

QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPI 119 
Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 



+SG S++++G +PE SLG++YLPAV ++A + + LG 



F + L+++A M 
FALFLIWAINM 261 

Homology with a predicted ORF from N.menin^tidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (0RF17a) from strain A oiN. 
meningitidis: 

10 20 30 

orfl7.pep GQHKKQAVNGKT VFTMMPGMI FGVFTGA FS 

I I M I I I I : I I I t M i I I I : I I I I : I I : I 
orfl7a QGLAQHPYAQHLA VGTSFAVMVFTAFSSML GQHKKQAVDWKT VFTMMPGMVFGVFAGA LS 
50 60 70 80 90 100 

40 50 60 70 80 90 

orf 17 . pep AKYIP AFGLQIFFILFLTAVAFK TLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 
I I t I I I I I I I I I t I I i I I t I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I 
orf 17a AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 



0RF17 


3 


HI0902 


72 


ORF17 


60 


HI0902 


131 


0RF17 


120 


HI0902 


190 


0RF17 


180 


HI0902 


250 
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110 



120 



130 



140 



150 



160 



100 110 120 130 140 150 

or f 17 . pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
I I I I I I I i I I I I I i I I M I I I i I I I i I I I I I I I I I I I I I I I M i I I I I I i I I I I I I I ) M 
orfl7a GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 



orf 17.pep 
orfl7a 



160 170 180 190 

AVLSAATIAFAPLGVK TAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 
I I I t I I I I i I I I i I t t I I I I I I I I I I I I I I I i I I I t I I I I I i i I I I t 
AVLSAATIAFAPLGVK TjVHKLSSAKLKKS FGIMLLLIAGKMLYNLLX 
240 



230 



250 



260 



The complete length ORF 17a nucleotide sequence <SEQ ID 89> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCGTATTTAC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCACAA 
CCCCTTCTTA 
CATCCGGCCT 
CTCAACGGCC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGCTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGA 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
TCGGCGTAGG 
GATTTGCAGG 
CACATCCTTC 
AGCACAAAAA 
GGTATGGTAT 
AGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCCGTAG 
CGGCGGCACG 
GTTTGGCACA 
GCCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGAC 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCAC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



GCAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CCTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAAAA 
TGTACAACCT 



This encodes a protein having amino acid sequence <SEQ ID 90>: 



1 

51 
101 
151 
201 
251 



MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L 
AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTVFTMMP 



DLQGLAQHPY 
GMVFGVFAGA 



LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 
FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP lALSGAISYL 
LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 
SFGIMLLLIA GKMLYNLL* 



0RF17a and ORF17-1 show 98.9% identity in 268 aa overlap: 



10 20 30 40 50 60 

orf 17a . pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
i I M M I I I I 1 I I t I I I I I t i I I I I I I I I I I I I I t I I I I I I I I I t I I i i I I i I I I I I I I I 
orf 17-1 MWHWDIILILLAVGS7UVGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 17a . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQIFFILFLT 
I I I I I I I I I I I i I I I I I I I I I t I I I 1 1 I I I I i : t t I t : I I I I I I I i I I I I I ) I I I I I I I I 
orf 17-1 AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17 a. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
lltllllillllMllllllllllllllllllllilitMIIMIIIIItltllinili 
orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17a. pep IGT SSGLAWP I ALSGAISYLLNGLN I AGL PEGS LG FLY LPAVAVLSAAT I AFAPLGVKTA 

I I t I I I I t I I t I I I I I I i I I I I I I I I I I I I I I I I M I I I I I I I I I M ) I I I I t I t M I 1 I 
orf 17-1 IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



orf 17a. pep 



250 260 269 

HKLSSAKLKKSFGIMLLLIAGKMLYNLLX 



wo 99/24578 



-104- 



PCT/IB98/01665 



orfl7-l 



I I I I I I I i I I M I I I I i I I I I t I I I I I I 
HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
250 260 



Homology with a predicted ORF from N. gonorrhoeae 

0RF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (0RF17.ng) from K 
gonorrhoeae: 



orf 17 .pep 
orf 17ng 
orf 17 .pep 
orf 17ng 
orf 17 .pep 
orf 17ng 
orf 17 .pep 
orf 17ng 



GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 3 0 

QGLAQHPYAQHLAVGTSFAVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALS 102 

AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 
{ I I I I I I 1) I I I i I I I I I M I I I i I M llllllillll ilMitlll:lllltIlli 
AKYIPAFGLQIFFILFLTAVAFKTLHTGRQT7VSRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

GSLSVPFLIHCGFPMKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 
tllllllllllllMillltllllllllltllltlllhllMlllltlMIMIIIMI 
GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 

AVLSAATXAFAPLGVKTAHKLSSAKLKKSFGIMLLLIAGKMLYNLL 196 
I i I I I I I I I I I M I I I I I I I I I I I I I I : I I I I t I I I It t I I I I I I I 
AVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKMLYNLL 2 68 



An ORFl 7ng nucleotide sequence <SEQ ID 91> is predicted to encode a protein having amino acid 



sequence <SEQ ID 92>: 



1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP lALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

251 SFGIMLLLIA GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 



1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

401 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAGAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 



1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYI PAFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP lALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 

' 251 S FGIMLLLIA GKMLYNLL * 

0RF17ng-l and 0RF17-1 show 96.6% identity in 268 aa overlap: 



10 20 30 40 50 60 

orf 17-1 . pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
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10 



15 



20 



25 



30 



35 



40 



45 



-105- 

I M I I I I I I I I I i I I I I t I t I I I I M I I I I M I I I I I I I I I I t I I I I I I i I I I I I I I I I I 
orfl7ng-l mWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl7-l.pep AVMVFTAFSSMLGQHKKQAVDWKT VFT^4MPGMI FGVFTGAL S AKYI PAFGLQI FFI L FLT 
ltlllllil[llllilltlll)il:l:ttlt)lllll:lllillilllllltiltllMI 
orfl7ng-l AVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALSAKYI PAFGLQI FFI L FLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17-1 . pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I t M I I I I i t I I I I I I I t I I I I I I I i I I I I : 1 I I I i I I I t I I t I i I I I I I t I M i I t I 
orfl7ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17-1 . pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
||MltllllllllillHI:ltlltllll!li)llltl)l)lllliltlllllllltil 
orfl7ng-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 

250 260 269 

orf 17-1 . pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
IIIMIMI: lllltlllllllllllll 
orfl7ng-l HKLSSAKLKESFGIMLLLIAGKMLYNLLX 

250 260 

In addition, ORF17ng-l shows significant homology with a hypothetical HAnfluenzae protein: 

3plP44070[Y902_HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
HI0902 - Haemophilus influenzae (strain Rd KW20) gi 11573922 (U32772) H. influenzae 
predicted coding region HI0902 [Haemophilus influenzae] Length = 264 

Score = 74 (34.9 bits). Expect = 1.6e-23, Sum P(2) =^ 1.6e-23 

Identities = 15/43 (34%), Positives = 23/43 (53%) 

Query: 55 AVGTSFAVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVF 97 

A+GTSFA +V T S HK + W+ + + P +-f VF 
Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 

Score = 195 (91.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e-23 
Identities - 44/114 (38%), Positives = 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGW(;NPLM 207 

Query: 210 PEGSLGFLYLPAVAVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 



50 

This analysis, including the homology with the hypothetical HAnfluenzae transmembrane protein, 
suggests that the proteins fix)m meningitidis and N, gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

55 The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 95>: 



60 



1 

51 
101 
151 
201 
251 



.GGAAACGGAT 
CGTCAGTAAT 
TGCATTATTG 
CTCAAACTTT 
GCTGATGGCG 
CGTCAACGTT 



GGCAGGCAGA 
GTATCGATGA 
CTTTTCGGGA 
ATGCGCTGAA 
GTTGCCTATG 
CGGCGGCTCG 



CCCCGAACAT 
CGCTTGCTTT 
ACGGTTCAAG 
GCCGGTTTAT 
TCCACCGCTG 
CAGCTGCGAC 



CCGCTGCTCG 
TGTCGGAATA 
TGTTTGTGTT 
TGiSTTCGTGT 
CGGTATAGAC 
TCGGCGGGTT 



GGCTTTTTGC 
TGTGCGTTGG 
TGCGGCACTG 
TGCAGTTTGT 
CGGCAGCCGC 
GACGGCAGCG 
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301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 

351 A 

This corresponds to the amino acid sequence <SEQ ID 96; 0RF18>: 

1 . . GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 

51 LKLYALKPVy WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 

101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GTATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This corresponds to the amino acid sequence <SEQ ID 98; 0RF18-1>: 



1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRAA P LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA L KPVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG 
201 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF18 shows 98.3% identity over a 1 16aa overlap with an ORF (0RF18a) from strain A of A^. 
meningitidis: 



10 20 30 

orf 18 . pep GNGWQADPEHPLLGLF AVSNVSMTLAFVGI 

I I I I I I I i I I I I I I I i I I i I t i I I t I I I 1 I 
orf 18a TRAAP LFIPHFYLTLGSIFFFI GHWNRKTDGNGWQADPEHPLLGLF AVSNVSMTLAFVGI 
60 70 80 90 100 110 



40 50 60 70 80 90 

orf 18 . pep CALVHYCFSGTVQVFVFAALLKLYALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
I I I I t I I I I i I I I I 1 M I t 1 I I I I I I I t I I I I i I t I I I M I I 1 I I I I M I I I I I I I I I I 
or f 1 8a CALV HY CFSXTVQVFVFAALLKL YAL KPVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
120 130 140 150 160 170 



100 110 
orf 18. pep QLRL GGLTAALMQVSVLVLLLS EIGRX 
I i I I I I I I I I I I t I I I t I I M I I M I 
orfl8a QLRLG GLTAALMQXSVLVLLLS EIGRX 
180 190 200 

The complete length ORFlSa nucleotide sequence <SEQ ID 99> is: 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 
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451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 
501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 
551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 
601 AGATAA 

This encodes a protein having amino acid sequence <SEQ ID 100>: 



1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRA AP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA L KPVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG 6LTAALMQXS -VLVLLLS EIG 
201 R* 

ORFlSa and 0RF18-1 show 99.0% identity in 201 aa overlap: 



10 20 30 40 50 60 

orflSa.pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
1 t 1 I I I I I I I I I 1 I M M I M I I I 1 I I I I t I I I I I I I I M i I I i I M I I I I M I I I I I i I 
orfl8-l MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

10 20 30 40 50 60 



70 80 90 100 110 120 

orfl8a.pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I i 1 1 1 1 1 1 1 1 [ 1 1 1 1 1 1 1 i M I 

orfl8-l LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 18a . pep YCFSXTVQVFVFAALLKLYAIJ<PVYWFVr^FVI^VAYVHRCGIDRQPPSTFGGSQLRLG 
I I I i I I I t I I I I t M It I I I I I t I I n i I I I I I I I I I M I t I I I I M t I I I I I I t I I I I 
orf 18-1 YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 



190 200 
orf 18a .pep GLTAALMQXSVLVLLLSEIGRX 
I I I I I I M M I I I I I I I I I I I 
orf 18-1 GLTAALMQVSVLVLLLSEIGRX 

190 200 



Homology with a predicted ORF from N.2onorrhoeae 

0RF18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORFlS.ng) from K 
gonorrhoeae: 



orf 18. pep GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 30 

t t M t t I I t I I I I I I I I I t I t I I i I I t M I 

orflSng TRAAPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 115 

orf 18 . pep CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 

I I I I I M I I I t I I t I i I I I I I I I I I M I I I I I I I t I I I 1 i I I M I I I t I I I I I t i I M I I 

orfl8ng CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 175 



orf 18 . pep QLRLGGLTAALMQVSVLVLLLSEIGR 116 

liltl 1:1 tllt:| ::I!:IIM 
orfl8ng QLRLGVLAAMLMQVAVTAMLLAEIGR 201 

The complete length ORFlSng nucleotide sequence is <SEQ ID 101>: 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGt aTGCGGcggt 

51 tttTctgTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTGCGTT GTGGCTCGGC ATCTCGGTTT TAGGGGTAAA GCTGATGCCG 

151 GGGATGTGGG GAATGACCCG CGCCGCGCCT TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGTATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CATTGCTCAA ACTTTATGCG CTGT^GCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTATTGA TGGCGGttgC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GTTCGCAGCT GCGACTCGGC GTGTTGGCGG 
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551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 
601 AGATGA 

This encodes a protein having amino acid sequence <SEQ ID 102>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIALWLG ISVLGVKLMP 
51 GMWGMTRA AP LFIPHFYLTL GSIFFFI GYW NRKTDGNGWQ ADPEHPLLGL 
101 FAV SNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LKP VYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLAE IG 
201 R* 

This ORFlSng protein sequence shows 94.0% identity in 201 aa overlap with ORF18-1: 

10 20 30 40 50 60 

MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
M I M I t I I I I M I I I I I I I I i I I I t t I I I I I I I I I I I I I I I I [ : M I t I : M I I I I I I 
MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIALWLGISVLGVKLMPOIWGMTRAAP 
10 20 30 40 50 60 

70 80 90 100 110 120 

LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
I I M I I I I I t t t t I I 1 I I : I I t I I t I I I t I I M I I I I I [ I I I I I I I I I M I I I I [ I i I I I 
LFI PHFYLT LGS I FFFIG YWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVG I CALVH 
70 80 90 100 110 120 

130 140 150 160 170 ISO 

YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
I I I I I i I I I I I I I I I I M I I I I I I I I I i I i I t I I t I I I i M I [ I I i I I I I I I I t I t I I I I 
YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
130 140 150 160 170 180 

190 200 
GLTAALMQVSVLVLLLSEIGRX 
t : I I I I I : I : : I I : I I I I I 
VLAAMLMQVAVTAMLIAEIGRX 
190 200 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be usefiil antigens for vaccines or diagnostics, or for raising antibodies. 



Example 13 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 103>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG . . . 

This conresponds to the amino acid sequence <SEQ ID 104; 0RF19>: 



orf 18-1 .pep 
orf 18ng 

orf 18-1. pep 
orf 18ng 

orf 18-1 .pep 
orf 18ng 

orf 18-1 .pep 
orf 18ng 



1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 
51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 
101 GAX... 

Further work revealed the complete nucleotide sequence <SEQ ED 105>: 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 



wo 99/24578 



-109- 



PCT/IB98/01665 



201 CGTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAT^GC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FTA ASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLP FILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGEJTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 
IVEALNL NLG YWILLTALFV CQPNYTATKS RVRQ RIAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with predicted transmenbrane protein YHFK of K influenzae (accession number P44289) 
0RF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLKNIITT 65 

L +I+++PVF +V AA +W +MP +LGIIAGGLVDLDN TGRLKN+ T 

YHFK 5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 
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orfl9 66 VALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISSFIVQLHIGKPIQYIVLMTVLTFIFTMIGA 101 



Homology with a predicted ORF from Kmeninsitidis fstrain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) from strain A of K 



meningitidis: 



orf 19.pep 
orfl9a 

orf 19. pep 
orfl9a 

orfl9a 



10 20 30 40 50 60 

MKTPLLKPLLITSLPVFASVFTA ASIVWQLGEP KLAMPFVLGIIAGGLVDL DNXXTGRLK 
hi) M I I I I M t I I I M I t I I I I i I I I I M I t ) I I t I I I I I I I I I I I I I I I Hill 
MKTPPLKPLLITSLPVFASVFTA AS IVWQLGEP KLAMPFVLGIIAGGLVDL DNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 

NIITTVALFTLSSLTAQSTLGTGLP FILAMTLMTXXFTILGAX 
lii:||||||tllt:|tllllllilllllllllt llhll 

NIIATVALFTLSSLVAQSTLGTGLP FILAMTLMTFGFTIMGAV GLKYRTFAFGALAVATY 
70 80 90 100 110 120 

TTLTYTPETYWLTN PFMILCGTVLYSTAIILF QIILPHRPVQENVANAYEALGSYLEAKA 
130 140 150 160 170 180 



The complete length ORF19a nucleotide sequence <SEQ ID 107> is: 



1 


ATGAAAACCC 


51 


CGCCAGTGTC 


101 


AGCTCGCCAT 


151 


TTGGACAACC 


201 


CCTGTTCACC 


251 


TGCCATTCAT 


301 


GGCGCGGTCG 


351 


CGCCACCTAC 


401 


ACCCCTTTAT 


451 


CTGTTCCAAA 


501 


CGCCTACGAA 


551 


ATCCCGACGA 


601 


AGCAACACCG 


651 


TTACCGCCTT 


701 


GCTACTACTT 


751 


GTCGACTACC 


801 


CCGCATCCAC 


851 


CCCAAGCCCT 


901 


CGCGCCATCG 


951 


CGACAATCCC 


1001 


GCGTCGACCA 


1051 


AACGACCGCA 


1101 


CAAZUy^CACC 


1151 


TATTCCGCCA 


1201 


ATCGTCGAAG 


1251 


CCTTTTCGTC 


1301 


AGCGCATCGC 


1351 


TACTTTACCC 


1401 


CACCCTCTTT 


1451 


TCATCACCAT 


1501 


TACGCCGCCA 


1551 


TGCCTGGGCG 


1601 


TCGAACGCAC 


1651 


AAAATCACCG 


1701 


CCGCGCCACC 


1751 


CCCTTTCCGA 


1801 


CCCGGCTTTA 


1851 


CGCCCTCGGC 


1901 


TTACCGCACA 


1951 


CACCTGCCCG 


2001 


GCGCGGCGAA 


2051 


ACATCCTCCT 


2101 


TACCGCGCCT 


2151 


A 



CACCCCTCAA 
TTTACCGCCG 
GCCCTTCGTA 
GCCTGACCGG 
CTCTCCTCAC 
CCTCGCCATG 
GGCTGAAATA 
ACCACACTTA 
GATTCTGTGC 
TCATCCTGCC 
GCACTCGGCA 
AGCCGAATGG 
GCGTCATCAC 
CGCGGCAAAC 
CGCCGCCCAA 
AAGAGATGTC 
CGCCTGCTCG 
GCGCGCAAGC 
AAGGCTGCCG 
GACATCCGCC 
GCAGTTCCGC 
TGGGCGACAC 
TGGCAGGCAA 
TGCCGTCCGC 
CCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAAGCCCTG 
TGCCCGTACG 
GCAGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTC 
AAACCGAACC 
CTCGACACCC 
CCAACAGCTC 
ACCGACAAAT 



GCCTCTGCTC 
CCTCCATCGT 
CTCGGCATCA 
ACGGCTGAAA 
TTGTCGCGCA 
ACCCTGATGA 
CCGCACCTTC 
CCTACACCCC 
GGAACCGTAC 
CCACCGCCCC 
GCTACCTCGA 
ATAGGCAACC 
CGCCTTCAAC 
ACCGCCACCC 
GACATACACG 
CGAAAAATTC 
AAATGCAGGG 
AAAGACTACG 
CCAATCGCTG 
ACCTGCGCCG 
CAACTCCAGC 
CCGCATCGCC 
TCCGTCCGCA 
CTGTCCCTTG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATCGAC 
ACCTGTGGCC 
GCCGTATGCA 
AAGCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCA 
CAACTCATCG 
TCCGCACAGG 



ATTACCTCGC 
CTGGCAGCTG 
TCGCTGGCGG 
AACATCATCG 
AAGCACCCTC 
CTTTCGGCTT 
GCCTTCGGCG 
CG/VAACCTAC 
TGTACAGCAC 
GTTCAAGAAA 
AGCCAAAGCC 
GCCACATCGA 
CAATGCCGTT 
GCGCACCGCC 
AACGCATCAG 
AAAAACACCG 
ACAAGCCTGC 
TTTACAGCAA 
CGCCTCCTTT 
CCTTCTCGAC 
ACAACGGCCT 
GCCCTCGAAA 
GCTAAACCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATCGTCA 
ATACAGCTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAACGGCGC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACAGCAC 
CAGCAGCGGA 
CCCGGCAGCT 
CAGCCCCAAA 



TTCCCGTTTT 
GGCGAACCCA 
CCTGGTCGAT 
CCACCGTCGC 
GGCACAGGTT 
TACCATCATG 
CACTCGCCGT 
TGGCTGACCA 
CGCCATCATC 
ACGTCGCCAA 
GACTTTTTCG 
CCTCGCCATG 
CCGCCCTGTT 
AAAATGCTGC 
CTCCGCCCAC 
ACATCATCTT 
CGCAACACCG 
ACGCCTCGGC 
CAGACAGCAA 
AACCTCGGCA 
GCAGGCAGAA 
CCGGCAGCCT 
GAATCAGGCG 
CGCCTGCACC 
TACTGACCGC 
CGCGTCCGCC 
GCTCGTCCCC 
TCGCCAGTAC 
TCGACATTTT 
GTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
CTATCTCGAA 
ACGTCGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 
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This encodes a protein having amino acid sequence <SEQ ID 108>: 

1 MKTPPLKPLL ITSLPVFASV FTA ASIVWQL GEP KLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLVAQSTL GTGLP FILAM TLMTFGETIM 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQENVANAYE ALGSYLEAKA DFFDPDEAEW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQBMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDNP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETGSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

401 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TXIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

0RF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 

10 20 30 40 50 60 

orf 19a .pep ' MKTPPLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I I I I I I I I I I i I I I I I i I I i I I I I M I I t I I I I I I I I I I I I I M I t I I I I I 1 I I t I I i I 
orf 19-1 MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1 9a . pep NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMTFGFTIMGAVGLKYRTFAFGALAVATY 
I I t : I I 1 I I I I t I I : I I I I I 1 I I I I I I I I I I I I I I I I I I : I I I I I M I I M I I I I I I I M 
orf 19-1 NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19a. pep TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQENVANAYEALGSYLEAKA 
IMilltltllllllllliilMllllil:llll:lllllttl:|llll:tll:lltlll 
orf 19-1 TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1 9a . pep DFFDPDEAEW IGNRH I DLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

I t t I I I I I I t I I I I I t I I I I I I M I I I I I I I i I I I I I I t M I I I I I I I I I I I I I I I I i I 
orf 19-1 DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19a. pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
t i I I I I i I I M I It t I i I I I I I I I I I I I I I I I I I I I I i I t I I i i I I I I i I I I I t M I I t f 
orf 19-1 DIHERISSMVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 1 9a . pep RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I I i i I I I I I I I I I I I I I I : I I I t I I t I I t I I I I I I I I I 1 I I I I I I I I I t I t I [ I I t 1 I I I 
orf 19-1 RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19a . pep ALETGSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
I t I I : I I I I I I I I I I I I I I i I t I I M 1 I I i I t I I I I I t M i I I I I I It I I I i I I I I i t I I 
orf 19-1 ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19a. pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
I II I I I I I I I I I II I I I I t I t I M II It M II II II I I II M I I II M I I I II I I I t I I I 
orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19a. pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
1 I I t I I I I I I I I I i I I I I I I I I I I I II I I I I 1 I I I t I I It N I II I It t I I t I t It I I 1 I 
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orfl9-l STFFITIQALTSLSIAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 19a . pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

llltlllllMllllllllllllllllitllllilllliMMIIIIIIIilliliitll 

orf 19-1 AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 19a. pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
t I I I I I I t I t I I I I 1 I I I i t I I I I I I I I t I I I I I M I I i I I I I t I I I I I I I I I t I I I I I I 
orf 19-1 PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

610 620 630 640 . 650 660 

670 680 690 700 710 

orf 19a . pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
i I I I I I I I I I I I I t t I I I I I i I I I I I I I I I i I I I I I I I I t I t I t t I I I I I I I I I I I t 
orf 19-1 QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

Homology with a predicted ORF from Ksonorrhoeae 

0RF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from K 
gonorrhoeae: 

orf 19 . pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

I I I t I It I t I I i I I I I I t M I I i I I I I t I I I I t I I I I I I I I I I I I I I t I I t I I I I I I I 
orfl9ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orf 19. pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGAX 103 

I I I : I I I I i I I I I I I It t I I t I I I i I I I I I t I I I t I II I I 
orfl9ng NIIATVALFTLSSLTAQSTLGTGLPFILAMTUMTFGFTILGAVGLKYRTFAFGALAVATY 120 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 1 10>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATV ALFT LSSLTAQSTL G TGLPFILAM TLMTFGFTIL 

101 GAVGLBTYRTF AFGALAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAII 

151 LFQIILPHRP VQESVAN AYE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTAGCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 

451 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAA/VACACCG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

851 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

901 CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 
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15 



20 



1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



GCGTcgacca 
Aacgaccgca 
caaaaaCAcc 
TATTCCGCCA 
ATCGTCgaag 
CCTTTTCGTC 
AACGCATCGC 
TACTTCACCC 
CACCCTGTTC 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATTGCCG 
CCGCATCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



gcagtTCcgc 
tgggcgacaC 
tggcaggCAA 
TGCCGTCCGC 
cCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAGGCACTG 
TGCCCGTGCG 
GCGGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTT 
ACATGG(3ACC 
CTCGGCACCC 
CCAACAGCTC 
ACCGACAAAT 



caactCCGAC 
CCGCATCGCC 
TCCGTCCGCa 
CTGTCCCTCG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATcgaC 
ACCTGTGGCC 
GCCGTATGCA 
AACCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCG 
CAACTCATCG 
TCCGCACAGG 



ACAgcgactC 
GCCCtcgaaa 
gctgaaCCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATTGTCA 
ATACAGTTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAGCGGCAC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACGGCAT 
CAGCAGCGGA 
CccgGCAACT 
CAGCCCCAAA 



CCCCGCcgaa 
ccggcagctT 
GAATCatgCG 
CGCCTGCACC 
TGCTGACCGC 
CGCGTGTACC 
GCTCGTCCCC 
TCGCCGGTAC 
TCCACCTTCT 
TTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
ATACCTCCAA 
ACATAGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 



25 This corresponds to the amino acid sequence <SEQ ID 1 12; 0RF19ng-l>: 
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35 



40 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MKTPLLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNL NLG 
YFTPSVETKL 
YAAMPVRIID 
KIMRLKTGE 
PGFTLLKTGY 
HLPDMGPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQESVANAYE 
QCRSALFYRL 
KNTDHFRIR 
RLLSDGNDSP 
ALETGSFKNT 
YWILLTALFV 
WIVIAGTTLF 
TIIGASLAWA 
TGDDIEYRIT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL 
LSSLTAQSTL 
TTLTYTPETY 
ALGGYLEAKA 
RGKHRHPRTA 
RLLEMQGQAC 
DIRHLSRLLD 
WQAIRPQLNL 
CQPNYTATKS 
FMTRTYKYSF 
AVSYLWPDWK 
RRRAHEHTAA 
AYRSEMHEEC 
LGTLRTRSSG 



GEP KLAMPFV LGIIAGGLVD 
GTGLP FILAM TLMTFGFTIL 
WLTNPFMILC GTVLYSTAII 



DFFDPDEAAW 
KMLRYYFAAQ 
RNTAQAIRSG 
NLGSVDQQFR 
ESCVFRHAVR 
RVYQRIAGTV 



IGNRHIDLAM 
DIHERISSAH 
KDYVYSKRLG 
QLRHSDSPAE 
LSLWAAACT 
LGVIVGSLVP 



STFFITIQAL 
YLTLERTAAL 
LSSTLSDMSS 
SPDFTAQFHL 
TQSHILLQQL 



TSLSLAGLDV 
AVCSSGTYLQ 
EPAKFADSLQ 
AAEHTAHIFQ 
QLIARQLEPY 



ORF19ng-l and 0RF19-1 show 95.5% identity in 716 aa overlap: 



45 



10 20 30 40 50 60 

orf 19-1 . pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
M I I I I I I t I [ I I I I I I I I M I I I I M I I I I I 1 1 t I M I I M I I I I I I M I I I I M I I i I 
orfl9ng-l MKTPLLKPLL IT SLPVFASVFTAAS I VWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 



50 



70 80 90 100 110 120 

orf 19-1 . pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
|[t:|lilllllllillllttllliltlllltll)tll(IMIIIil I I I I I I I I I I t I I 
orfl9ng-l NIIATVALETLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 19-1 . pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
I ! M I I I I I M I I I M I I I I I 1 I I I [ I I I: I I I I : M I I I I I M i I I i I : I I I I I I I I i t 
orfl9ng-l TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 19-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
I I I I I I I t I I t I I i I t I I I I t I I M 1 I I M I I I I I I I I I I I I M I I (1 It I I I I I i M I I 
orfl9ng-l DFFDPDEAAW I GNRH I DLAMSNTGV IT AFNQCRSALFYRLRGICHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19-1 . pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
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|j|]|llllllll)lillllltllllill:ltlllii)ltlilMI:|::l)IIMIItl 
orfl9ng-l DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19-1. pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I i I t I I I I I I I I t I I: I I I I i I i t I I I I I I I I I I I I I I i I I : f : I I I I I I I I I I I I 
orfl9ng-l RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19-1 . pep ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
I I I I : I: It I i M I I M I I i I I I I I t I I t I i I I t I I I I I I I t i I I i I I I I I I I I I I i I I 
orfl9ng-l ALETGSFKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19-1 . pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
I t I M I I I I I 1 t It I I I I i I I I [ I i I I I I I I I i I I I i I I I I M I : I I I I I I I i [ I 1 I t I 
orfl9ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19-1. pep STFFITIQTVLTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
t II I it t t 1 1 1 1 1 I I t I t I 1 1 I I II 1 I 1 1 I i 1 It I t I It 1 11 t It I I I t i II I I tl t I t 1 
orfl9ng-l STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 19-1 . pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
I}|t:t:||:tt:tlM:ttlllt:tit I I t I I t I I I I t 1 1 II 1 t t II I I t t I I t t I t I 
orfl9ng-l AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

610 620 530 640 650 660 

orf 19-1 . pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
I I t I I 1 1 I t 1 I I I I t I I 1 1 I t t t 1 1 1 t I I t t I 1 I 1 t t I t 1 I I 1 I I It I I I t t I : i t t I 
orfl9ng-l PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 

610 620 630 640 650 660 

670 680 690 700 710 

orf 19-1 . pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I t 1 I II 1 t I i t t t t t : I I t t 1 I I I 1 I 1 I 1 t I t t I t t I I 1 t 1 I I I I t I i I I I I 1 1 I t 
orfl9ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

In addition, 0RF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

sp|033369|YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnlj PID| ell54438 
(AJ002423) hypothetical protein [Neisseria gonorrh] Length =417 
Score = 1512 (705.6 bits). Expect = 5.3e-203, P = 5.3e-203 
Identities = 301/326 (92%), Positives = 306/326 (93%) 



55 



60 



65 



Query: 


307 


Sbjct: 


1 


Query: 


367 


Sbjct: 


61 


Query: 


427 


Sbjct: 


121 


Query: 


487 


Sbjct: 


181 



RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 



FKNTWQAIRPQLNLES VFRHAVRLSLWAAACTIVEALNLNLGYWILLT LFVCQPNYT 



ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 



IQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 
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Query: 547 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

Query: 607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG ++ + +P 
Sbjct: 301 KPATALTGYISALGHTAAKCTKNAAP 326 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein, it is predicted that the proteins fi-om N.meningitidis and KgonorrhoeaCy 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
113>: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGTVAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCgAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

401 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

451 ACTCAATTCT TATCATAAGT TCGGCATTCC GGCGTTTACG CCAC.GTTTC 

501, TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 

551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC CAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

701 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

751 CACGATTTTc GCGTCTTATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

801 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 

851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 

901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 

951 TGACGCTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

1001 GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAmGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

1201 CTTTAyCGGC CCACTrrAAC rCa^TCGGAC TTTCGCTTGC CATCGGTCTG 

1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

1301 TATTTACCAA CCTGG.CAAG GGTTGGGCAG CGTTCTT.AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence <SEQ ID 1 14; ORF20>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

401 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 
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101 CGGGTATGGG GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

401 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

451 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGCT TAATCATGAT 

1101 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

1401 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 1 16; ORF20-1>: 

1 MNMLGALAKV GSLTMVSRVL GEVRDTVIAR AFGAGMATDA FEVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVA GM LSFVLVIVTA 

101 LGILAA PWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG L RLCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAirT LICTQIJINLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

451 SLAVMCGGLW AAQAYLPFEW AHAGGMRKAG QLCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the MviN virulence factor of S. tvphimurium (accession number P37169) 
ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

Orf20 1 MNMI^AIAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFECLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
MviN 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 



Orf20 61 AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 

+QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 

MviN 74 SQAFVPIIAEYKSKQGEEATRIFVAYVSGLLTUUAVVTVAGMLAAPWVIMVTAPGFADT 133 

Orf20 121 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG++LN++-t-+F IPAF P FLN+S I FALF P 

MviN 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 

MviN 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMGPAILGV 253 



Orf20 
MviN 



241 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGTILLP+LSK A+ + 
254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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Orf20 


301 


MviN 


314 


Orf20 


361 


MviN 


374 


Orf20 


421 


MviN 


434 



+++ L+DWGLRLC LL LP+AV L +L+ PL +LF Y FT FDA MTQ ALIAYS G 

DE YCRLMDWGLRLCFLLALPSAVALGI LAKPLTVS LFQYGKFTAFDAAMTQRALI AYSVG 373 



LIGLI++KVLAPGFY+RQ+I PVKIAI TLX QLMNL F 



NA LL++ LR+ 1+ P G 



C+ 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF20 shows 93.5% identity over a 447aa overly with an ORF (ORF20a) from strain A of N. 
15 meningitidis: 
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orf20.pep 
orf20a 

orf20.pep 
orf20a 

orf 20 . pep 
orf20a 

orf 20 . pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20. pep 
orf20a 

orf 20 . pep 
orf20a 



10 20 30 40 50 60 

MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
IIIIIIUtlllllltllllliilllllllMtlitlllllltllllllllltlltllil 
MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
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AQAFVPILAEYKETRSKEAXEAFIRHVA GML5FVLVIVTALGILAA PWVIYVSAPSFAQD 
I I I 1 I i I I I I I I I I I I I I I : I I [ M I I I I I M I I I I i I I I I 1 I I I I I I I 1 I I I [ I: M : I 
AQAFVPILAEYKETRSECEATEAFIRHVA GMLSFVLVIVTALGILAA PWVIYVSAPGFAKD 

70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFQLSIDLLRIT FPYILLISLSSFVGSVLN SYHKFGIPAFTPX FLNVSFIVFALFFVP 
M I I i I I I I I I I I I I I I I M t I 1 i I i I I I M I M M I : 1 t I I I t : I I i i I t I I 1 i I I t t i 
ADKFQLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFS I PAFT PT FLNVS FIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPP VTAXAWAVFVGGI LQLX FQLPWLAKLGFLKLPKLS FKDAAVNRVMK QMAPAI LGV 
I I I I i I I I I i t I I I I I I I I I I i I I i I I I I I I I I I I I I I I I I I I I I I M i [ I I I 1 I I I I 
YFDPP VTALAWAVFVGGILQLG FQLPWLAKLGFLKLPKLSFKDAAVNRVMK QMAPAILGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQVSLVI NT I FAS YLQSGS VSWMYYADRMMEL PSGVLGAALGT I LLPTLSKHSANQDT 
I I I t : I n i I I I I I I I I I I t I I I t I I I I I t I I I I 1 : I t I I M I M I I i I 1 I t I I I I I I I I 
SVAQISLVI NTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 . 270 280 290 300 

310 320 330 340 350 360 

EQFSALLDWGL RLCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQH ALIAYSFG 
I I I I I I t I I i I I I It I I I I I I I I : t I 1 I I i I I I I I I M I M I I M I M t I I i I I i I I I 
EQFSALLDWGLR XCMLLTLPAAVGMAVLS FPLVATLFMYREFTLFDAQMTQH ALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

LIGLIMIKVLA PGFYARQNIXXPV KIAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 
I I I M t I I I i I I I t I M t M : I [ I I I I I i I t I : M I I I I 111 : I I I I I I I I t I I I 
LIGLIMIKVL APGFYARQNIKTPV KIAIFTLICTQLMNLAFI GPLKHVGLS LAIGLGACI 

370 380 390 400 410 420 

430 440 450 

NAGLLFYL LRRHGIYQPXQGLGSVLXQKCCSRSPX 
I I I I I II II t II I I I i I :t :: I : 

NAGLLFYL LRRHGIYQPGKGW AAFIAKMLLSLAVMGGGL YAAQIWLPFDWAHAG6MQKAA 
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The complete length ORF20a nucleotide sequence <SEQ ID 1 17> is: 



65 



1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 
101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 
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151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT - GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTCAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCAAAGAT GCCGACAAAT TTCAGCTCTC TATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATCTTATTG ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 

4 51 CTCAATTCCT ATCATAAATT CAGCATTCCT GCGTTTACGC CCACGTTCCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CTCCCGTTAC CGCGCTGGCT TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC MCTGCCCTG GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGATTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAAC TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCNTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CT^CCTTGTT TATGTACCGA GAATTCACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATCATGAT 

1101 TAAAGTGTTG GCGCCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATTTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGGGAGG CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 

1401 GTTCGACTGG GCACACGCCG GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 1 18>: 



1 MNMLGALVKV GSLTMVSRVL GEVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI yVSAPGFAKD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFSIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG L RXCMLLTLP AAVGMAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAIFT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 

451 SLAVMGGGL Y AAQIWLPFDW AHAGGMQKAA R LFILIAVGG GLYFASLA AL 

501 GFRPRHFKRV ES* 

ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 



10 20 30 40 50 60 

orf 20a . pep MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I t I M I I : I I I I I I I I I I I I I I t I I [ I I t I I I t I I I I I I I I I I t I ) I I I I 1 I I t I t I I I I 
orf 20-1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 
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orf 20a . pep AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAKD 

I I I II I I i ! I I 1! I I I I I I : I I I I I I I I I I I I I I I I I I I t I I I I M M I I I I I I I I I I : I 
orf 20-1 AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 20a . pep ADKFQLS I DLLRITFPYILL I SLSSFVGSVLNSYHKFS I PAFTPTFLNVS FIVFALFFVP 

II II I II I I I II I I I I I II I I I II I I I t n II I I II I : I I t II II t I I 1 I I t M I II U I 
orf 20-1 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 20a. pep YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
I I I I I I I I t I I I I II I I II I I I I I I M I I I I I I I It I I I I I I II I I 1 I I I I t I I M II I 1 
orf 20-1 YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 
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250 260 270 280 290 300 

SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 
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I I I I : I I I I i I I I t I t I I I f I I I I I I I I M I t I I I : I I I I I I I I I I I I I I I I M I t I I I I 
orf20-l SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 20a . pep EQFSALLDWGLRXCMLLTLPAAVGMAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

iililllltlll tlllltllllhllllllllllltlllllllllllllllllllilll 

orf 20-1 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 20a . pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I 1 I I I M M I I I I I t I 1 I I I I I I t I I I i I I t I t M I I I I t t I I t I I I I I t i I I I I I I I I I 
orf 20-1 LIGLrMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 20a . pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMGGGLYAAQIWLPFDWAHAGGMQKAA 
I I M I I I I I I I t t I I M I i I M I I I I I I I i I I I I I 111:111 :|l|:tlllltl:lt: 
orf 20-1 NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

430 440 450 460 470 480 

490 500 510 

orf 20a . pep RLFILXAVGGGLYFASLAALGFRPRHFKRVESX 
: I I I I I I I I I I I I I I I I I I I I M I I I I I t I : I 
orf 2 0-1 QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 



30 



Homology with a predicted ORF from li. gonorrhoeae 

ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from AT. 
gonorrhoeae: 
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MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 
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AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 
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EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 
t i t t t I I t t I t I ! II t t I t I t t : I t t t I I t I II I t I t t t t f li I I I I t It It It I It I I 
EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 360 



LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 
ttllMtllll llllllll :||llltltltl:llill I III lllltlltlllt 
LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

NAGLLFYLLRRHGIYQPXQGLGSVLXQKCCSRSP 4 54 
tllll|:|:|:llll:| Mil: :lltllll 
NAGLLFFLFRKHGIYRPGQGLGQPSWRKCCSRSP 4 54 



420 



420 



An ORF20ng nucleotide sequence <SEQ ID 1 19> was predicted to encode a protein having amino 
acid sequence <SEQ ID 120>: 
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1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAGM LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAAGLAVLSF PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FIGPLKHAGL SLAIGLGACI NAGLLFFLFR KHGIYRPGQG LGQPSWRKCC 

451 SRSP* 

Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 121>: 



1 ATGAATATGC TTGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGAcg gAGGCTTTTA 

251 TCCGCCACGt tgcgggAatg CTGTCGTTTG TGCTGATcgt cGttacCGCG 

301 CTGGGCATAC TTGCCGCgcc tTGGGTGATT TATGTTtccg CgcccGGCTT 

351 TACCAAAGAC GCGGACAAGT TCCAACTTTC CATCAGCCTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCTTTGT CTTCTTTTGT CGGCTCGATA 

4 51 CTCAATTCCT ACCATAAGTT CGGCATTCCC GCGTTTACGC CCACGTTTTT 

501 AAACATCTCT TTTATCGTAT TCGCACTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTTT TTGTCGGCGG TATTTTGCAG 

601 CTCGGTTTCC AACTGCCGTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAAT TTCAAAGATG CGGCGGTCAA CCGCGTCATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG agcgTGGCGC AAATTTCTTT GgttATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTatta 

801 cgCCGACCGC ATGATGGAGc tgcgccGGGG CGTGCTGGGG GCTGCACTCG 

851 GTACAATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGccg GACTGGCGGT ATTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGA GAATTCACGC TGTTTGACGC ACAAATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATTATGAT 

1101 TAAAGTGTTG GCATCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTCGCC 

1201 TTTATCGGTC CGTTGAAACA CGCCGGGCTT TCGCTCGCCA TCGGCCTGGG 

1251 CGCGTGCATC AACGCCGGAT TGTTGTTCTT CCTGTTGCGC AAACACGGTA 

1301 TTTACCGGCC cggcaggggt tgggcggcgt TCTTGGCGAA AATGCTGCTC 

1351 GCGCTCGCqG TGATGTGCGG CGGACTGTGG GCGGCGCAGG CTTGCCTGCC 

1401 GTTCGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCTCT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVA GM LSFVLIWTA 

101 LGILAA PWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPT FLNIS FIVFALFFVP YF DP PVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM K QMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVK IAIFT LICTQLMNLA 

401 FIGPLKHAGL S LAIGLGACI MAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 

451 ALAVMCGGL W AAQACLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 
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or f 20- 1 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVI ARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
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orf20ng-l MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
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orf 20-1 . pep AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
llliltlllttlllllllt:illl]||ltllil{l::lltlMllltlMltilll|::| 
orf20ng-l AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVI YVSAPGFTKD 
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130 140 150 160 170 180 

orf 20-1 . pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 
i I I i I I I t : I I I 1. 1 t t I I i I I I I I I I I I t : I I I I I I I I I I t I I I I I I I : I i I I I I I I I I I 
orf20ng-l ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 

130 140 150 160 170 180 
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190 200 210 220 230 240 

orf 20-1 . pep YFDPPVTAIAWAVFVGGILQLGFQLPWUUOGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
M I I I i I I i I I I I I I I I I M I I I I I I I I I I I I I I I I I I I : t I I I I i I I I t I I i M I I I t I 
orf20ng-l YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKD/UIVNRVMKQMAPAILGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 20-1 . pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
I I I I : I I I 1 I I I I i I t I I I I I I I I I I I ) I I I t i I I I I I I I I I i i I I I I I I I I t I I I I I 
orf20ng-l SVAQISLVINTIFASYLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 20-1 . pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I I M I I I I t I I M I I I I I I I i I : I I M I I I I I [ M I I I I t I I t t I I i I I I t I I t I I I I I I 
orf20ng~l EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 20-1 . pep LIGLIMIKVIAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I I I I I I I I I t I I I I I i I I I I I I I i I I I I I I I I t I I I I I I t I I I I t : M I I I I I I I I I I 
orf20ng-l LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

370 380 390 400 410 420 
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orf 20-1 . pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
||||)|:|||:||||:|i:||||||||lll:lllll]|||IMI MIMItllltMtl 
orf20ng-l NAGLLFFLLRKHGIYRPGRGWAAFLAKMLLALAVMCGGLWAAQACLPFEWAHAGGMRKAG 

430 440 450 460 470 480 
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490 500 510 

orf 20-1 . pep QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 
t I I t M I t I li I t I I I I I I I I i M I I t I ) M : I 
orf20ng-l QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 

490 500 510 

In addition, ORF20ng-l shows significant homology with a virulence factor of S.typhimurium: 

spl P37169 |MVXN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi| 438252 (226133) mviB gene product [Salmonella typhimuritim] 
gnl|PID!dl005521 (D25292) 0RF2 [Salmonella typhimurium] Length = 524 

Score = 1573 (750.1 bits). Expect = l.le-220, Sum P(2) = l.le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 

MNMLGALAKVGSL'mVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L -HLA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 



+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 



ADKF L+ LLRITFPYILLISL-fS VG+ILN++++F IPAF PTFLNIS I FALF P 



Query: 


1 


Sbjct: 


14 


Query: 


61 


Sbjct: 


74 


Query: 


121 


Sbjct: 


134 


Query: 


181 


Sbjct: 


194 


Query: 


241 


Sbjct: 


254 



YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D 



RV+KQM PAILGV 



SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
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Query: 301 EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
Sbjct: 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Query: 361 LIGLIMIKVLASGFYARQNIECTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

LIGLI++BCVLA GFY+RQ+IKTPVKIAI TLX TQLMNLAFIGPLKHAGLSL+IGL AC+ 
Sbjct: 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Query: 421 NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 467 

NA LL++ LRK 1+ P GW VM L+ +P 

Sbjct: 434 NASLLYWQLRKQNIFTPQPGWMWFLMRLIISVLVMAAVLFGVLHIMP 480 

Score = 70 (33.4 bits) Expect = l.le-220, Sum P(2) = l.le-220 
Identities = 14/41 (34%), Positives = 23/41 (56%) 

Query: 469 EWAHAGGMRKAGQLCILIAVGGGLYFASLAALGFRPRHFKR 509 

EW+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAWIAGIAAYFAALAVLGFKVKEFVR 521 



Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N.meningitidis and K gonorrhoeae^ and their epitopes, could be 
usefril antigens for vaccines or diagnostics, or for raising antibodies. 



Example 15 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGAGGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG. . 

This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGAGGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GT^AAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 



1 


MIKIKKGLNL 


51 


VKKGQVLFED 


101 


EFERYAPEAL 


151 


VNAMDTNPLA 


201 


SENAANIETH 


251 


LFATGRLNTE 


301 


SGSVLNGAIT 


351 


TTLGHFLKNK 


401 


GDTDSAQALG 



PIAGRPEQAV 
KKNPGWFTA 
ANLSGEEVRR 
ADPTVIIKEA 
EFGGPHPAGL 
RVIALGGSQV 
QGAHDYLGRY 
LFKFNTAVNG 
CLELDEEDLA 



YDGPAITEVA 
PASGKIAAIH 
NLIQSGLWTA 
AEDFKRGLLV 
SGTHIHFIEP 
NKPRLLRTVL 
HNQISVIEEG 
GDRAMVPIGT 
LCSFVCPGKY 



LLGEEYAGMR 
RGEKRVLQSV 
LRTRPFSKIP 
LSRLTERKIH 
VGANBCTVWTI 
GAKVSQITAG 
RSKELFGWVA 
YERVMPLDIL 
EYGPLLRKVL 



PSMKVKEGDA 
VIAVEGNDEI 
AVDAEPFAIF 
VCKAAGADVP 
NYQDVITIGR 
ELVDTDNRVI 
PQPDKYSITR 
PTLLLRDLIV 
ETIEKEG* 



Further work identified the corresponding gene in strain A of N.meningitidis <SEQ ID 127>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGATTAAAA 
GCAAGTCATT 
AAGAATATGC 
GTCAAAAAAG 
GTTTACCGCG 
AGCGCGTACT 
GAGTTCGAAC 
ANTNNGNNGC 
GTCCGTTCAG 
GTCAATGCGA 
CAAAGAAGCC 
TGACCGAGCG 
TCTGAAAATG 
GGCCGGTTTG 
ACAAAACCGT 
TTGTTTGCAA 
TTCTCAAGTC 
TATCGCAAAT 
TCCGGTTCGG 
GGGACGCTAC 
AGCTGTTCGG 
ACGACCCTCG 
CGTCAACGGT 
TAATGCCGCT 
GGCGATACCG 
AGACCTCGCT 
CGCTGTTGCG 



TCAAAAAAGG 
TATGACGGGC 
CGGTATGCGC 
GCCAAGTGCT 
CCNGTTTCAG 
TCAGTCGGTC 
GCTACGCGCC 
AATCTGATCC 
CAAAATCCCT 
TGGACACCAA 
GNCGANGATT 
TAAAATCCAT 
CTGCCAACAT 
AGTGGCACGC 
TTGGACCATC 
CAGGCCGTCT 
AACAAACCAC 
TACTGCGGGC 
TATTGAACGG 
CACAATCAGA 
CTGGGTTGCG 
GCCATTTCCT 
GGCGACCGCG 
AGACATCCTG 
ACAGCGCGCA 
TTGTGCAGCT 
TAAGGTGCTG 



TCTAAACCTG 
CCGTCATTAC 
CCCTNGATGA 
GTTTGAAGAC 
GCAAAATCGC 
GTGATTGCCG 
CGAAGCGTTG 
AATCCGGTTT 
GCCGTCGATG 
TCCGCTNGCG 
TCAGACGANG 
GTGTGTAAGG 
CGAAACACAT 
ACATTCATTT 
AATTATCAAG 
GAACACCGAG 
GCCTCTTGCG 
GAATTGGTTG 
CGCGATTACA 
TTTCCGTTAT 
CCGCAGCCGG 
GAAAAACAAA 
CCATGGTGCC 
CCTACCCTGC 
AGCATTGGGT 
TCGTCTGCCC 
GAAACCNTTG 



CCCATCGCGG 
CGAAGTCGCG 
AAGTCAAGGA 
AAAAAGNATC 
CGCCATCCAT 
TTGAAGGCAA 
GCAAACTTAA 
GTGGACTGCG 
CCGAGCCGTT 
GCAGACCCTG 
TNTGCTGGTA 
CAGCTGGCGC 
GAATTCGGCG 
CATTGAGCCG 
ATGTAATTGC 
CGCGTGATTG 
TACCGTTTTG 
ACGCAGACAA 
CAAGGCGCGC 
CGAAGAAGGC 
ACAAATACTC 
CTCTTCAAGT 
GATTGGTACT 
TTTTGCGCGA 
TGCTTGGAAT 
GGGCAAATAC 
AGAAGGAAGG 



GCAGACCGGA 
TTGCTTGGCG 
AGGCGATGCC 
CGGGCGTGGT 
CGCGGCGAAA 
CGACGAAATC 
GCGGCGANGA 
CTGCGTANCC 
CGCCATCTTC 
TGGTTGTGAT 
TTGAGCCGTT 
AGACGTGCCG 
GCCCGCATCC 
GTCGGTGCAA 
CATCGGACGT 
CTTTGGGTGG 
GGTGCGAAAG 
CCGCGTGATT 
ACGATTATTT 
CGCAGCAAAG 
CATCACGCGT 
TCACGACAGC 
TACGAGCGCG 
TTTAATCGTC 
TGGACGAAGA 
GAATANGGCC 
CTGA 



This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>; 



1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIICEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 



10 20 30 40 50 60 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
I I I M I I I I t I I i I I I I I I I I I: I I t i 1 I I i 1 I I I I I I I I I I I I I I I I I I I I t t I I I 
orf 22a MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
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10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II tlll[lll:IMIIIIIIIIIIIMMIilll IMIIIMMIItllllll I 
orf 22a ECKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 
70 80 90 100 110 120 



10 



15 



20 



25 



30 



130 140 150 

orf 22 . pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 
I I I i I I I I M I I : I I I I I I I I I M I I I I I I I t I I t I t I 
orf 22a NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 

10 20 30 40 50 60 

orf 22a . pep MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
M I I I I I I I I I I I I I I I I :: I I i I : I I I I I I I I I t I t I I I t I I I I I I I I t I M I I I I I I 

orf 22-1 mikikkglnlpiagrpeqavydgpaitevallgeeyagmrpsmkvkegdavkkgqvlfed 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22a . pep KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

II I I II I I II : I II I I I I I t I II I I I I II II I I I I I I I I M I [ I I I I I I I I II I I 
orf 22-1 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22a . pep NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
I II I I I I I I I I I : I I I t I I II t I I II I I I I I I I I II I I II II I : I : I III I I : I II 
orf 22-1 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 



35 



190 200 210 220 230 240 

orf 22a . pep LSRLTERKIHVCBCAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
i I I I II I I I II II I I I I I I I I I I I II I I I I t M I II I I I i I I I II i II I t I I II I I I I t I 
orf 22-1 LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 



40 



45 



50 



55 



250 260 270 280 290 300 

orf 22a . pep NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADNRVI 
M I I I I : M I I I I M II I I I I I I t I I I II I I I I II I I II I II t I t I II I II M I : I t I I I 
orf 22-1 NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22a . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I I I II I I I I I I I 1 I I I I t I I I I 1 I I I M II I t I I I I I I II I i I I I I I II M tl I t i I II i 
orf 22-1 SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22a . pep LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I t II : II M I I I I I I t 1 II I I I I I t I I I II I I I II I I I I I I II I I I I I I I I I I I I I I I I I 
orf 22-1 LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 



60 



430 440 
or f 22a . pep LCSFVCPGKYEXGPLLRKVLETXEKEGX 

1 1 1 1 1 1 1 II II 1 1 1 II I II 1 1 lint 

orf 22-1 LCSFVCPGKYEYGPLLRKVLBTIEKEGX 

430 440 



Further work identified a partial gene sequence <SEQ ED 129> fi-om N.gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 



65 



1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
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151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENA/VNIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

Further work identified complete gonococcal gene <SEQ ID 131>: 



1 ATGATTAAAA TCAAAAAAGG 

51 GCAAGTCATT TATGACGGCC 

101 AAGAATATGT CGGCATGCGC 

151 GTCAAAAAAG GCCAAGTGCT 

201 ATTTACTGCG CCGGCTTCAG 

251 AGCGCGTACT TCAGTCAGTC 

301 GAGTTCGAAC GCTACGTACC 

351 AGTGCGCCGC AACCTGATTC 

401 GTCCGTTCAG CAAAATCCCT 

451 GTCAATGCGA TGGACACCAA 

501 CAAAGAAGCC GCCGAAGACT 

551 TGACCGAACG TAAAATCCAT 

601 TCTGAAAATG CTGCCAATAT 

651 TGCCGGCTTG AGTGGCACGC 

701 ATAAAACCGT GTGGACCATC 

751 TTGTTCGTAA CAGGCCGTCT 

801 CCTGCAAGTC AACAAACCGC 

851 TGTCTCAACT TACCGCCGGC 

901 TCCGGTTCGG TATTGAACGG 

951 GGGACGCTAC CACAATCAGA 

1001 AGCTGTTCGG CTGGGTTGCG 

1051 ACCACTCTCG GCCATTTCCT 

1101 CGTCAACGGC GGCGACCGCG 

1151 TAATGCCGTT GGACATCCTG 

1201 GqCGATACCG ACAGCGCGCA 

1251 AGACCTCGCT TTGTGCAGCT 

1301 CGCTGTTGCG CAAAGTGCTG 

This encodes a protein having amino acic 



TCTAAATCTG CCCATCGCGG GCAGACCGGA 
CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 
CCCTCGATGA AAATCAAGGA AGGTGAAGCC 
GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 
GCAAAATCGC CGCTATTCAC CGTGGCGAAA 
GTGATTGCCG TTGAAGGCAA CGACGAAATC 
TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 
AATCAGGCTT ATGGACTGCG CTTCGCACCC 
GCCGTAGATG CCGAGCCGTT CGCCATCTTC 
TCCGCTGGCT GCCGACCCTA CGGTCATCAT 
TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 
GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 
CGAAACACAT GAATTTGGCG GCCCGCATCC 
ACATTCATTT CATCGAGCCA GTCGGCGCGA 
AATTATCAAG ACGTGATTGC TATCGGACGT 
GAATACCGAG CGCGTGGTTG CCTTGGGCGG 
GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 
GAATTGGTTG ACGCGGACAA CCGCGTGATT 
TGCGATTGCA CAAGGCGCGC ATGATTATTT 
TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 
CCGCAGCCGG ACAAATACTC CATCACGCGC 
AAAAAACAAA CTCTTCAAGT TCACGACAGC 
CCATGGTACC GATCGGCACT TATGAGCGCG 
CCTACCTTGC TTTTGCGCGA TTTAATCGTC 
GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 
TCGTCTGCCC GGGCAAATAC GAATACGGCC 
GAAACCATTG AGAAGGAAGG CTGA 

sequence <SEQ ID 132; ORF22ng-l>: 



1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAJCVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 
overlap with ORF22ng: 



orf22.pep 

orf22ng 

orf22.pep 

orf22ng 

orf22.pep 

orf22ng 



MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
t I I I I 1 I t I I I I I i I I i I :: I I M I I I I i t I I I t I I : I t I I I I 1 : I I I : I I I I I I I I I I I 
MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 



NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 
I t I I I I I I I I I I I I I I t I I I I I I I I I I i I I t I I I I I I I 

NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 



60 



60 



120 



KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 
I I I I I I M I t I I I I I I I I I t It M I I t I I I I I I I I I I t I I I I I I : I I 11 I : I I : I : I I I 
KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 



158 



180 



The complete sequences fix>m strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 



10 20 30 40 50 60 

orf 22-1 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



I I I I I I I I t M M t I I t I:: I I [ I 11 ) I I I I I I I I I : I I I I 11 I : I I i : I I I 11 M I I I I 
orf22ng-l MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22-1 . pep kknpgwftapasgkiaaihrgekrvlqswiavegndeieferyapealanlsgeevrr 

IIIMIIIIIIIIIIIItlllliillllllllMlltllltlll|:tllll:ll:|:llt 
orf22ng-l KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22-1 . pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIEVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
I I I I I I I I t t I I I I I I I I M I I I M M It t I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I 
orf22ng-l NLIQSGLWTALRTRPFSKIPAVDAEPFAIEVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22-1 . pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFrEEVGANKTVWTI 
I I I 1 I I i I I t I I t I I I I I I t I I i I I I I I I t I I I I I t I I t t I I I I I I I I t [ I I t I I I I I I t 
orf22ng-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 2 2- 1 . pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 
tlili|:||iM:)llltll)l:IIM I I I I I t M I I t I I I I I I i : M I i I I I : I I I I I 
orf22ng-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22-1 . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
lillllil|:)llllllllillllltlllitlllMMIIMtlltllllllllilill[ 
orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22-1 . pep LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
t I 1 I : I M I t t I I I I I tl I I I I i I I I 1 I I I I I I M I I I I I I i I I t I I I I I I I t i I I I I I I 
orf22ng-l LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
orf 22-1 . pep LCSFVCPGKYEYGPLLRKVLETIEKEGX 
I I I I I I t M i I t I i I I t I I I I It M I I I 
orf22ng-l LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 

Computer analysis of these sequences gave the following results: 

Homology with 48kDa outer membrane protein of Actinobacillus pleuroprteumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 

Orf22 1 



48k:Da 



MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 
MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 
1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 



55 



60 



orf22 61 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

KKNPGWFTAPASG + I+RGEKRVLQSWI VE +++I F RY LA+LS E+V++ 
48kDa 61 ECKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

orf 22 121 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNP 
48kDa 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNP 158 

ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 
Length = 44 9 



65 



Score = 530 bits (1351), Expect = e-150 
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Identities = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 



Query: 


1 


MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 


60 






MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 




Sbjct: 


1 


MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 


60 


Query: 


61 


KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 


120 






KK PGWFTAP SG + I+RGEKRVLQSWI VEG+++I F RY LA+LS + 




Sbjct: 


61 


KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 


120 


Query: 


121 


NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 


180 






NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE DF+ V 




Sbjct: 


121 


NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 


180 


Query: 


181 


LSRL—TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 


237 




L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 




Sbjct: 


181 


LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 


240 


Query: 


238 


WTINYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAICVSQITAGELVDADN 


297 






W +NYQDVIAIG+LF TG L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 




Sbjct: 


241 


WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 


300 


Query : 


298 


RVISGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 


357 






RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 




Sbjct: 


301 


RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 


360 


Query: 


358 


KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 


417 






K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 




Sbjct: 


361 


K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 


419 


Query: 


418 


XXXXXSFVCPGKYEXGPLLRKVLETXEKEG 447 





++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gil 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
pleuropneumoniae] Length = 4 49 
Score - 555 bits (1414), Expect = e-157 

Identities = 284/450 (63%), Positives = 337/450 (74%), Gaps = 4/450 (0%) 

Query: 27 MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 86 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 
Sbjct: 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

Query: 87 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 146 

KKNPGWFTAPASG + I+RGEKRVLQSWI VEG+++I F RY LA LS+E+V++ 
Sbjct: 61 KKNPGWFTAPTVSGTWTINRGEKRVLQSWIKVEGDEQITFTRYEi^AQLASLSAEQVKQ 120 

Query: 147 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 206 

NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNPLAADP V++KE DFK GL V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 

Query: 207 LSRL— TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 263 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 

Query: 264 WTINYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADN 323 

W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 324 RVISGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 383 

RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 384 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 44 3 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 



Query: 444 XXXXXSFVCPGKYEYGPLLRKVLETIEKEG 473 

++VCPGK YGP+LR LE lEKEG 
Sbjct: 420 DLALCTYVCPGKNNYGPMLRAALEKIEKEG 449 



wo 99/24578 



-128- 



PCT/IB98/01665 



Based on this analysis, including the homology with the outer membrane protein of Actinobacillns 
pleuropneumoniae, it was predicted that these proteins from Kmeningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.coli^ as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E,coli. Purified GST-fiision protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immimogen. 

Example 16 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 133>: 

1 ..GCGnCGnAAA TCATCCATCC CO. .nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

401 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

451 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 

801 GrkCmramTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; 0RF12>: 

1 ..AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 135> to be: 

1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 
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551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CGGCCAATCT 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGT^GAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGG 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
TVAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CGATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This corresponds to the amino acid sequence <SEQ ID 136; 0RF12-1>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIYI 



VSLLGVGIA E KSGLISALMR 
WLIPLSAII FHSLG RHPLA 
QQAAQIIHPD YWGPEANW£ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKSIWFIF 



MST LGLYLVI IFFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPH PVTLF IIFIVLLLIA SAV GAYFGLS 
VSLLNADGFI KIL THTVKNF TG FAPLGTVL 

LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGSVLFI 



LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFV TEKI 
KGLIW AGWF VALSALLAWS 
LLFALPGIVY GRVTRSLRGE 



FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* 



IQAAYRIGDS 
FFLIAWIALF 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (0RF12a) from strain A of K 
meningitidis: 

10 20 30 

orf 12 .pep AXXIIHPXXWGPEANWFFMVASTFVIALI 

I MM 1 1 i 1 1 1 i I n I I t I I I I I It i 

orf 12a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALI 
180 190 200 210 220 230 

40 50 60 70 80 90 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
IMIIIIMtllllilMlillllltllllllllllllMtllllMlililllilllll 
orf 12a GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 

100 110 120 130 140 150 

orf 12 . pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

I I I I M I I I I I I M I I I I I I i I M I I I I i M I I M I I I I I t I I I I I I I I I I I I I I I t I I 
or f 1 2 a PADG I LRHPETGLVSGS P FLKS I WFI FLLFALPG I VYGRVTRSLRGEQE WNAMAESMS 

300 310 320 330 340 350 

160 170 180 190 200 210 

orf 12 . pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

II I I M I II I I II I I I M I II I II t I I I II II I I II II 11 I II I I I I II I II I I II 
orf 12a TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

360 370 380 390 400 410 

220 230 240 250 260 270 

orf 12 . pep IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 
I I I I I I I M 1 I I I t I I M I I I I I II I I I II I I I II I I I I I I II I M I I M I t I I t II I 
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orfl2a 



IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMS YFGLIMATVIKY 
420 430 440 450 460 470 



280 290 300 310 320 

orf 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I I I I I I I f i I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orf 12a KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
480 490 500 510 520 

The complete length 0RF12a nucleotide sequence <SEQ ID 137> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ATGAGTCAAA 
ATGGCTGGGC 
TTGTGTTATT 
GTCCCCGATC 
GATTCACGTT 
CGCATACCGT 
GTTTCTTTAT 
ATTAATGCGC 
TGGTTGTTTT 
GTCGTCCTAA 
TCCGCTTGCC 
CGGCCAATCT 
CAACAGGCGG 
CMCTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



CCGATACGCA 
AATATGTTGC 
GCTGATTGCC 
CGCGCCCTGT 
GTCAGCCTGC 
TAAAAATTTC 
TGGGCGTGGG 
TTATTGCTCA 
TACAGGGATT 
TCCCTTTGTC 
GGTCTGGCTG 
GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACGGGACGGA 
CGCACCCGGT 
TCTGCCGCCG 
TGGTGCGAAA 
TCGATGCTGA 
ACCGGTTTCG 
GATTGCGGAA 
CAAAATCTCC 
TTATCTAATA 
CGCCATCATC 
CGGCTTTCGC 
ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGA 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGATTTTTAC 
TACGCTTTTT 
GTGCGTATTT 
GGACGTGCCG 
CGGTTTGATC 
CGCCGTTGGG 
AAATCGGGCT 
ACGCAAACTC 
CCGCTTCTGA 
TTTCATTCCC 
CGGCGTTTCG 
CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



GCACAGTCGA 
ATTATTTTCA 
CGGACTATCC 
ATGACGGTTT 
AAAATCCTGA 
AACGGTGTTG 
TGATTTCCGC 
ACTACTTTTA 
ATTGGGCTAT 
TCGGCCGCCA 
GGCGGTTATT 
AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CAATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This encodes a protein having amino acid sequence <SEQ ID 138>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIHV 



VSLLGVGIAE KSGLISALMR 
WLIPLSAII FHSL GRHPLA 
QQAAQIIHPD YWGPEANWF^ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKSIWFIF 



MST LGLYLVI IFFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPHP VTLF XIFIVLLLIA SAAG AYFGLS 
VSLLDADGLI KIL THTVKNF TGFAPLGTVL 

LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGS VLFI 
IQAAYRIGDS 
FFLIAWIALF 



LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFVT EKI 
KGLIW AGWF VALSALLAWS 
LLFALPGIVY GRVTRSLRGE 



FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* 



0RF12a and 0RF12-1 show 99.0% identity in 522 aa overlap: 

lO' 20 30 40 50 60 

MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 
I I I I t i I I I I I I i I I i I I I I I I I I I I I M I I i I t I I M I I I I : I I I I I M I t I I I 11 I I t 
MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12a . pep GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
[ I M I I M :: I I I I : I t I : I I I I M I I i I I I I M I I I I I I I M I M I M I I I I M I t I It 
orf 12-1 GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISAI^ 



orf 12a . pep 
orfl2-l 
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70 



80 



90 



100 



110 



120 



130 140 150 160 170 180 

orf 12a . pep LLLTKSPRKLTTFMVVFTGXLSNTASELGYVVLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I I I I I I i I I I I I I I M I i I I I i I i I I i I I 1 I t I I i I It I I I I I M t M I I M t I t i M I 
orf 12-1 LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12a . pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYEVTEKI 
[ M i I t I I I I I I I I I i I I i I [ [ I I I I I I I M t M I I t M I I I I i I I I I I I I I M I I I i I I 
orf 12-1 GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 12a . pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
I I I I I I I M I I I t I i I I I i I I t I i I I t [ I I I I I I I I I I I I I I I I I t I I I t I I I I I I I I I I 
orf 12-1 VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12a . pep PETGLVSGSPFLKSI WFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
I I I t I I I I t f I I I I I t i I I I I I M I M I M I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I 
orf 12-1 PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 12a . pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I I I I I I I I I I I I I M I t I I ) t I I 1! I I I I I I I I I I I t I I I I I i t I I I I I i I I t I I t M I I 
orf 12-1 IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 12a . pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
I I I I I t I I ) I t I I I t i I I I I I I I i I I I t I I I I I I I M 1 I I I I I I I i I I ! I i I I I M t I I I 
orf 12-1 AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 

490 500 510 520 

orf 12a. pep LISMMLPYSAFFLIAWIALFCIVfVFVLGLPVGPGAPTFYPAPX 
IIMlllllMlllintllllllllliMIMIMIIMMI 
orf 12-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

490 500 510 520 



Homology with a predicted ORF from N. gonorrhoeae 

0RF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
gonorrhoeae: 



orf 12 .pep 
orf 12ng 
orf 12 .pep 
orf 12ng 
orf 12 .pep 
orfl2ng 
orf 12 .pep 
orfl2ng 
orf 12 .pep 
orfl2ng 



AXXIIHPXXWGPEANWFFMVASTFVIALI 30 
I Mil I t i I i I I I It I : I M I I I i [ I 

AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALI 232 

GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 90 
t I I t t t II li t t t I II f I t I t I t I t 11 t t I t t t I t t t t 1 t t t I t II t t I t t I t IJ t I I t I 

GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 292 

PADGILRHPETGLVSGSPFLKSIWFIFLLFTVLPGIVYGRVTRSLRGEQEWNAXAESMS 150 
t I I t 1 I I I II t I 1 1 : t I I I I 11 t I I t I I I I I t I II t t t I 1 : I I t t t I I : 1 1 I I I Mill 

PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 352 

TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 210 

II I I II t I I I I I I II I t I I II I I I I I I I I : I I I : I I I t I I I I i I t 1 I II I I I I I 

TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 412 

IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 270 
I I I 11 t I II It I 1 II I I I I I I I t I I : I 1 I I I I I I I I I I t I I I I I I I I I M I I I I I I I 

IGSASAQWAVTAPI FVPMLMLAGNAPQVIQAAYRIGDSVTN I ITPMMS YFGLIMATVIKY 472 
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orf 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 

I I M I I I I I I I t i I M I i I I I I I i I I I I i I I I I M I I I I i I : i I t I I : I 
orfl2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 

The complete length 0RF12ng nucleotide sequence <SEQ ID 139> is: 

1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC ZUITATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC tCtgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TA7VAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG . 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 

1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 

1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDAPGLI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 WLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANWF FMAASTFVIA LIGYFVT EKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVAGSP FLKS IWFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGS VLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGTPTFYP VP* 

0RF12ng shows 97.1% identity in 522 aa overlap with ORF12-1 : 

10 20 30 40 50 60 

MSQT DTQRDGRFLRTVEWLGNMLPHPVTLFI I FI VLLLI ASAVGAYFGLSVPDPRPVGAK 
|||||::|:|||||||llllllltltlllltlllllliniillltMlllllltlllli 
MSQTDARRSGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
10 20 30 40 50 60 

70 80 90 100 110 120 

GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I I t M I :: I 1 I I : I I I : t I I I I I t i I I I It M I I I I I It M I I I I I I I I ( t I I I I I I I 
GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
70 80 90 100 110 120 

130 140 150 160 170 180 

LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I t I i I I t 1 II I t t 1 1 I t I I II II 1 1 II t I I 1 II I t 1 I : t 1 t I i I I I I II I I I I I I I I I 1 
LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVS 



orf 12-1 -pep 
orfl2ng 

orf 12-1 .pep 
orfl2ng 

orf 12-1. pep 
orfl2ng 
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130 140 150 160 170 180 

190 200 210 220 230 240 

GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 
t I I I I I I I I I I t I [ t I I I I I I I I I I I I I I I I M I I I I I I i I I : I i I I t I I I I I I I I I I I I 
GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALIGYFVTEKI 
190 200 210 220 230 240 

250 260 270 280 290 300 

VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
I 1 I I I I I I I I i t I I I I I I I I I i I I t i I I I M I I I i I I M i I I I I I 1 I I I ) 1 i I I I I I I t I 
VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
250 260 270 280 290 300 

310 320 330 340 350 360 

PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
I I I I I I : I I M i I i I I I I I I I I t I M I I I I I I : I M I M I : I I I I I M M I I I I I I I I t I 
PETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLVI 
310 320 330 340 350 360 

370 380 390 400 410 420 

IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
)|||||||Millllltllllllll:itllllllliltMlltlilltlIIIIIIIIIII 
IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
370 380 390 400 410 420 

430 440 450 460 470 480 

AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
I I I I I I I I t I M I I I I M i I M I t M I i i M t I I t I I I I I I I I I t I I I t I I I I I I t t I I I 
AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
430 440 450 460 470 480 

490 500 - 510 520 

LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I t I I I I 1 M I I M I I i I I I i It j I I I I I I 1 I: I I i I I : i I 
LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 
490 500 510 520 

In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

spiP4 6133|YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gi 1 1787597 (AE000231) hypothetical protein in ogt 5 'region (Escherichia coli] 
Length 510 
Score - 329 bits (835), Expect = 2e-89 

Identities = 178/507 (35%), Positives = 281/507 (55%), Gaps = 15/507 (2%) 



Query: 


8 


RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 


67 




+SG+ VE +GN +PHP +A+ + FG+S +P D 




Sbjct: 


13 


QSGKLYGWVERIGNKVPH PFLLFI YLI I VLMVTTAI LSAFC^VS AKN P T DGT P 


64 


Query : 


68 


IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 


127 




+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 




Sbjct: 


65 


VWKNLLSVEGLHWFLPNVIKNFSGFAPLGAILALVLGAGLAERVGLLPALMVKMASHVN 


124 


Query: 


128 


RKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVSGGYSANL 


187 




+ ++MV+F S+ +S+ V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 




Sbjct: 


125 


ARYASYMVLFIAFFSHISSDAALVIMPPMGALIFLAVGRHPVAGLLAAIAGVGCGFTANL 


184 


Query: 


188 


FLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALIGYFVTEKIVEPQLGP 


247 






+ T D LL+GI+ +AA +P V NW+FMA+S V+ ++G +T+KI+EP+LG 




Sbjct: 


185 


LIVTTDVLLSGISTEAAAAFNPQMHVSVIDNWYFMASSVWLTIVGGLITDKIIEPRLGQ 


244 


Query: 


248 


YQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRHPETGLVA 


307 






+Q + + + S GL AGW + A +A ++P +GILR P V 




Sbjct: 


245 


WQGNSDEKLQTLTESQRF GLRIAGWSLLFIAAIALMVIPQNGILRDPINHTVM 


298 


Query: 


308 


GSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLXXXXXXXXX 


367 




SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 




Sbjct : 


299 


PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 


358 


Query: 


368 


XXXXNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQWAVTAPIF 


427 



NW+N+G++IAV L+ GL G F+G L+ +F+ + I S SA W++ APIF 



orf 12-1. pep 
orfl2ng 

orf 12-1 .pep 
orfl2ng 

orf 12-1. pep 
orfl2ng 

orf 12-1. pep 
orfl2ng 

orf 12-1 .pep 
orfl2ng 

orf 12-1 .pep 
orf 12ng 
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10 



Sbjct: 


359 


Query: 


428 


Sbjct: 


419 


Query: 


488 


Sbjct: 


479 



Sbjct: 359 VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 418 

VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 
VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 
VPMFMLLGFH PAFAQI LFRI ADS SVLPLAPVS PFVPLFLGFLQRYKPDAKLGT Y YSLVLP 478 

YSAFFLIAWIALFCIWVFVLGLPVGPG 514 
Y FL+ W+ + W +++GLP+GPG 
YPLIFLWWLLMLLAW-YLVGLPIGPG 504 

Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins fix)m Kmeningitidis and Kgonorrhoeae^ and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 17 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 141>: 

1 . .ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

20 151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

201 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

25 401 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

451 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT.. 

This corresponds to the amino acid sequence <SEQ ID 142; 0RF14>: 

1 ..TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

30 51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF G7WVSTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG.. 

Computer analysis of this amino acid sequence gave the following results: 

Homologv with a predicted ORF from Kmeninmtidis (strain A) 
35 0RF14 shows 94.0% identity over a 167aa overlap with an ORF (0RF14a) from strain A of A^. 
meningitidis: 

10 20 30 

orf 14 . pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 

t:Mll llllll|:[::llll:| MM 
40 orf 14a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 

150 160 170 180 190 200 

40 50 60 70 80 90 

orf 14 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
45 I I I I I I I I I I i I I t t I I I I I I I I I I I I M I i I I I I I M I I I I M I M I I I I I I I I 1 I I I 

orf 14a GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
210 220 230 240 250 260 

100 110 120 130 140 150 

50 orf 14 .pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i I M I i 1 1 n I I I I I 

orf 14a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310 320 
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160 

orf 14 . pep RXLTNPTVSVRIMLHSG 
I t I t t I I I I t I I t I I I 

orf 14a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
330 340 350 360 370 380 

The complete length ORF 14a nucleotide sequence <SEQ E) 143> is: 



1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

4 01 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 MCGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence <SEQ ID 144>: 



1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 KVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 



It should be noted that this sequence includes a stop codon at position 118. 
Homology with a predicted ORF from N. gonorrhoeae 

0RF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N, 
gonorrhoeae: 



orf 14 .pep 
orf 14ng 
orf 14 .pep 
orf 14ng 
orf 14. pep 
orf 14ng 
orf 14 .pep 
orf 14ng 



TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 
II M f I I : i I : I : I : : I I I I : i I I I I 
GRQFGFFRVGGASFVITAQAGIDDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 



VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
I 1 I I I t I I I I I I I I M I I I I I I 1 I 1 I I I I U I M t I I i I I t I : t I I : i I I t I i I I I i I 
VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 

RXLTNPTVSVRIMLHSG 1 67 

I I I I I I I t I I I It I : I 

RSLTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 



30 



208 



90 



GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
I I I I I I I I I II I I I M I II I I I II II i I II I I II t II M II II II I II M I t I tl t t I I 
GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 268 



150 



328 



The complete length 0RF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 



having amino acid sequence <SEQ ID 146>: 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKAD DVLF AFFLVGGFDF 

51 LRVIGCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT. LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from Kmeningitidis and N.gonorrhoeae, and their epitopes, could be usefiil antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 18 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 147>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



. GGCCATTACT 
GCCGTATCTG 
TGCCGAACTC 
TCGTTCGGCG 
GATGCAGCCG 
AAA . NTACGC 
GTGGCGGCGA 
CGCCGANAAA 
GTGCGGCGTT 
GAATACGANC 
GAATCAGGAA 



CCGACCGCAC 
CTTTATGGCA 
GGGCAGCTTC 
CGCTGATGAT 
TTTAAGATGA 
CTACGGGATT 
TTCTGCCGTT 
GGCGTTGTGC 
GCTGGTGATT 
CGGAAACCTA 
AAAGCCAACT 



TTGGAAGCCG 
CGCTGATTGC 
GGTTTCGGCT 
TGCGCTGTTA 
TGGTCGGCGA 
CAAAGTTTCT 
TGTGTTTGCG 
CGCAGACCGT 
ACCAGCGCGT 
CGCCCGTTAC 
GGATCGCACT 



CGTTTGGNCG 
GGTTATTGTG 
ATGCGTCGCT 
GACGTGTCGT 
CATGGTCAAC 
TAGCAAATAC 
TATATCGGTT 
GGTCGTGGCG 
TCACGATTTT 
CACGGCATCG 
CTTAAAA.CC 



GCCGCCGTCT 
ATGATTTTGA 
GGCGGCTTTG 
CAAATATGGC 
GAGGAGCAGA 
GGGCGCGGTC 
TGGCGAACAC 
TTTTATGTGG 
CAAAGTGAAG 
ATGTCGCCGC 
GCGC. . 



This corresponds to the amino acid sequence <SEQ ID 148; 0RF16>: 

1 ..GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GWPQTWVA FYVGAALLVI TSAFTIFKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 149>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGTCGGAAT 
AAAAAGCACG 
CCTTTACCCT 
GCAGACCCGC 
GATGCTGGTG. 
CGCGTTTGGG 
GCGGTTATTG 
CTATGCGTCG 
TAGACGTGTC 
GACATGGTCA 
CTTAGCAAAT 
CGTATATCGG 
GTGGTCGTGG 
GTTCACGATT 
ACCACGGCAT 
CTCTTGAAAA 
CTTCTGCTGG 
TTGCGGAAAA 
GAGGCGGGTA 
GGTGATTTGT 
CGGGTTATTT 
TTCTTCATCG 
CATCGCTTGG 
CCTTGTCGGG 
ATCTGTATGC 
TATGCTGGGC 
TGCTGCTGGG 
GTTTGA 



ATACGCCTCA 
ATTTGGATGC 
GCAAAGCTCG 
ACAATTTGGG 
CAGCCGATTG 
CGGCCGCCGT 
TGATGATTTT 
CTGGCGGCTT 
GTCAAATATG 
ACGAGGAGCA 
ACGGGCGCGG 
TTTGGCGAAC 
CGTTTTATGT 
TTCAAAGTGA 
CGATGTCGCC 
CCGCGCCTAA 
TTCGCCTTCC 
CGTCTGGCAC 
ACTGGTACGG 
TCGTTTGTAT 
CGGCTGTTTG 
GCAACCAATA 
GCGGGCATTA 
CAAGCATATG 
CTCAAATCGT 
GGCTTGCAGG 
CGCGTTTTCC 



AACAGCAAAA 
TCAGTTTCGG 
CAAATGAGCC 
CTGGTTTTTC 
TCGGCCATTA 
CTGCCGTATC 
GATGCCGAAC 
TGTCGTTCGG 
GCGATGCAGC 
GAAAGGCTAC 
TCGTGGCGGC 
ACCGCCGAGA 
GGGTGCGGCG 
AGGAATACGA 
GCGAATCAGG 
GGCGTTTTGG 
AATATATGTG 
ACCACCGATG 
CGTTTTGGCG 
TGGCGA7UVGT 
GCTTTGGGCG 
CGCGCTGGTG 
TCACTTATCC 
GGCACTTACT 
CGCTTCGCTG 
CCACTATGTT 
GTGTTCCTGA 



CAAGGTTTGC 
CTTTCTCGGC 
GCATTTTTCA 
ATCCTGCCGC 
CTCCGACCGC 
TGCTTTATGG 
TCGGGCAGCT 
CGCGCTGATG 
CGTTTAAGAT 
GCCTACGGGA 
GATTCTGCCG 
AAGGCGTTGT 
TTGCTGGTGA 
TCCGGAAACC 
AAAAAGCCAA 
ACGGTTACTT 
GACTTACTCG 
CGTCTTCCGT 
GCGGTGCAGT 
GCCGAATAAA 
CGCTCGGCTT 
TTGTCTTATA 
GCTGACGATT 
TGGGCTTGTT 
TTGAGTTTCG 
CTTGGTAGGG 
TTAAAGAAAC 



CCGCGCTGGC 
GTTCAGACGG 
AACGCTAGGC 
CGCTGGCGGG 
ACTTGGAAGC 
CACGCTGATT 
TCGGTTTCGG 
ATTGCGCTGT 
GATGGTCGGC 
TTCAAAGTTT 
TTTGTGTTTG 
GCCGCAGACC 
TTACCAGCGC 
TACGCCCGTT 
CTGGATCGAA 
TGGTGCAATT 
GCAGGCGCGA 
AGGTTATCAG 
CGGTTGCGGC 
TACCATAAGG 
TTTCTCCGTT 
CCTTAATCGG 
GTGACCAACG 
TAACGGCTCT 
TGCTTTTCCC 
GGCGTCGTCC 
ACACGGCGGG 
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This corresponds to the amino acid sequence <SEQ ID 150; 0RF16-1>: 



1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 
51 ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 
101 AVIVMIL MPN SGSFGFGY AS LAALSFGALM lALLDV SSNM AMQPFKMMVG 
151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLA N TAEKGWPQT 
201 VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 
251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 
301 EAGNWY GVLA AVQSVAAVIC SFVL AKVPNK YHKAGY FGCL ALGALGFFSV 
351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 
401 ICMP QIVASL LSFVLFPMLG GL QATM FLVG GWLLLGAFS VFLI KETHGG 
451 V* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF16a) from strain A of N. 



meningitidis: 



10 20 30 

or f 1 6 . pep GHYSDRTWKPRLXGR RLPYLLYGTLIAVIV 

I I I i I I I I I i I I I I I I I 11 i I I I I I I i I 1 
orf 1 6a IFQTLGADPHSLG WFFILPPLAGMLVQPIVG HYSDRTWKPRLGGRR LPYLLYGTLIAVIV 
50 60 70 80 90 100 



40 50 60 70 80 90 

or f 1 6 . pep MILMPNSGSFGFGY ASLAALSFGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKXYAYGI 
i I I n t i [ I I I I I I I I i t I I I I I I I I I I i i I I I i I I I t I I I I I I I I I i i I I t f I Mill 
orf 16a- MILMPNSGSFGFGY ASLAALSFGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKGYAYGI 
110 120 130 140 150 160 



100 110 120 130 140 150 

orf 16. pep QSFLANTG AWAAILPFVFAYIGLA NTAXKGWPQT VWAFYVGAALLVITSA FTIFKVK 
II I II I I II II II I I I t I I I II t I I I II II I I I I I t III I I II I I I It I I i I I I I I I t I 
orf 16a QSFLANTG AWAAILPFVFAYIGLAN TAEKGWPQT VWAFYVGAALLVITSA FTIFKVK 
170 180 190 200 210 220 



160 170 180 

orf 16 . pep EYXPETYARYHGIDVTVANQEKANWIALLKXA 
II I II I I I I i I I I I I I I I I I I I I I 111:1 
orf 16a EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 



orf 16a AENVWHTTDASSVGYQEAGNWYG VLAAVQSVAAVICSFVLA KVPNKYHKAGYFGCLALGA 
290 300 310 320 330 340 

The complete length 0RF16a nucleotide sequence <SEQ ID 151> is: 



1 ATGTCGGAAT ATACGCCTCA 

51 AAAAAGCACG ATTTGGATGC 

101 CCTTTACCCT GCAAAGCTCG 

151 GCCGATCCGC ACAGCCTCGG 

201 GATGCTGGTG CAGCCGATTG 

251 CGCGTTTGGG CGGCCGCCGT 

301 GCGGTTATTG TGATGATTTT 

351 CTATGCGTCG CTGGCGGCTT 

401 TAGACGTGTC GTCAAATATG 

451 GACATGGTCA ACGAGGAGCA 

501 CTTAGCGAAT ACGGGCGCGG 

551 CGTATATCGG TTTGGCGAAC 

601 GTGGTCGTGG CGTTTTATGT 

651 GTTCACGATT TTCAAAGTGA 

701 ACCACGGCAT CGATGTCGCC 

751 CTCTTGAAAA CCGCGCCTAA 

801 CTTCTGCTGG TTCGCCTTCC 

851 TTGCGGAAAA CGTCTGGCAC 

901 GAGGCGGGTA ACTGGTACGG 

951 GGTGATTTGT TCGTTTGTAT 



AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 
TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 
CAGATGAGCC GCATCTTCCA GACGCTCGGT 
CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 
TCGGCCATTA CTCCGACCGC ACTTGGAAGC 
CTGCCGTATC TGCTTTATGG CACGCTGATT 
GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 
TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 
GCGATGCAGC CGTTTAAGAT GATGGTCGGC 
GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 
TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 
ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 
GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 
AGGAATACAA TCCGGAAACC TACGCCCGTT 
GCGAATCAGG AAAAAGCCAA CTGGATCGAA 
GGCGTTTTGG ACGGTTACTT TGGTGCAATT 
AATATATGTG GACTTACTCG GCAGGCGCGA 
ACCACCGATG CGTCTTCCGT AGGTTATCAG 
CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 
TGGCGAAAGT GCCGAATAAA TACCATAAGG 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGGTTATTT 
TTCTTCATCG 
CATCGCTTGG 
CCTTGTCGGG 
ATCTGTATGC 
TATGCTGGGC 
TGCTGCTGGG 
GTTTGA 



CGGCTGTTTG 
GCAACCAATA 
GCGGGCATTA 
CAAGCATATG 
CGCAAATCGT 
GGCTTGCAGG 
CGCGTTTTCC 



GCTTTGGGCG 
CGCGCTGGTG 
TCACTTATCC 
GGCACTTACT 
CGCTTCGCTG 
CCACTATGTT 
GTGTTCCTGA 



CGCTCGGCTT 
TTGTCTTATA 
GCTGACGATT 
TGGGCCTGTT 
TTGAGTTTCG 
CTTGGTAGGG 
TTAAAGAAAC 



TTTCTCCGTT 
CCTTAATCGG 
GTGACCAACG 
TAACGGCTCT 
TGCTTTTCCC 
GGCGTCGTCC 
ACACGGCGGG 



This encodes a protein having amino acid sequence <SEQ ID 152>: 



10 



15 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 
ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 
AVIVMIL MPN SGSFGFGY AS LAALSFGALM lALLDV SSNM AMQPFKMMVG 
DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN TAEKGWPQT 
NAATAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA TU^QEKANWIE 
LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 
EAGNWYG VLA AVQSVAAVIC SFVLA KVPNK YHKAGYFGCL ALGALGFFSV 



FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTN7VLSGKHM GTYLGLFNGS 
ICMP QIVASL LSFVLFPMLG GLQ ATM FLVG GWLLLGAFS VFLI KETHGG 



V* 



20 0RF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

orfiea.pep mseytpqtakqglpalakstiwmlsfgflgvqtaftlqssc^srifqtlgadphslgwff 

lllllllllilltltlllMIIIIIIIIII[MIIIIIIMIiniM[|llll:lli)l 

orfl6-l mseytpqtakqglpalakstiwmlsfgflgvqtaftlqssqmsrifqtlgadphnlgwff 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 16a . pep ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 
lllillllllMIMIIIIIIMIIIMIMIIIIIIIilllllltltlllllttltlll 
orf 16-1 ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 16a . pep LAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILP 
MllliiilllllllltMlltlllltllllllltlMIMItlll.Mllllitllllll 
orf 16-1 laalsfgalmialldvssnmamqpfkmmvgdmvneeqkgyaygiqsflantgavvaailp 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 16a . pep FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYNPETYARYHGIDVA 

1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 i 1 1 1 1 : 1 1 n I I I I I i I I I 

orf 1 6-1 FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 1 6a . pep aNQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 
I t M I I I I M I I i I I I i t i I I I I I I I I I M I I I I I I i I I M M I I i ) I t t I I I I I I I I I t 
orf 16-1 ANQEKANWIEUJCTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 16a , pep EAG^WYGVIAAVQSVAAVICSFVIAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 
I I It I I 1 I I I 1 I I I I i I I I I I I i I I I I i I I t I I I I i I I [ I I i I I I I I I I I i I I I I I I I i I 
orf 16-1 EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orfiea.pep LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 
I I I I I I I I I t I M i I I I I I 1 I I I I t I t I I I M I I I M I M I I I I t I I I I I I I I I I I i I t I 
orf 16-1 LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 



65 



430 440 450 

orf 1 6a . pep GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I I I I I i I t I I I 

o r f 1 6- 1 GLQATMFLVGG WLLLGAFS VFLI KETHGGVX 

430 440 450 
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Homology with a predicted ORF from N.2onorrhoeae 

0RF16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from K 
gonorrhoeae: 



orf 16-pep 


GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 


30 


1: 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I i 1 1 It 1 




orf 16ng 


HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 


131 


orf 16,pep 


MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKXYAYGI 


90 


M 1 M 1 1 t 1 1 t t 1 1 t 1 t 1 t 1 1 1 1 t 1 1 M 1 1 1 1 1 1 t i 1 1 1 t M 1 1 1 1 1 M 1 1 1 1 1 1 t t 1 t 




orf 16ng 


MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 


191 


orf 16 .pep 


QSFLANTGAWAAILPFVFAYIGLANTAXKGWPQTVWAFYVGAALLVITSAFTIFKVK 


150 




1 1 1 1 1 1 1 1 1 1 1 1 i 1 t t 1 1 i 1 1 1 1 1 1 1 1 1 1 1 t 1 t 1 t t 1 1 M M 1 1 1 1 : 1 1 1 1 1 1 1 III 




orf 16ng 


QSFIANTDAVVAAILPFVFAYIGIANTAEKGVVPQTVVVAFYVGAALLIITSAFTISKVK 


251 


orf 16. pep 


EYXPETYARYHGIDVAANQEKANWIALLKXA 


181 


II t 1 t 1 1 1 i II 1 1 1 1 1 1 II 1 1 t 1 : IM:I 




orf 16ng 


EYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWTVTPVQFFCWFAFRYMWTYSAGAI 


311 



The complete length 0RF16ng nucleotide sequence <SEQ ID 153> is: 

1 ATGATAGGGG ATCGCCGCGC CGGCAACCAT TTCGGATTTT CCAAAGCAAA 

51 TACTTTTCAA ATCAAAAAAA AGGATTTACT TTATGTCGGA ATATACGCCT 

101 CAAACAGCAA AACAAGGTTT GCCCGCGCCG GCAAAAAGCA CGATTTGGAT 

151 GTTGAGCTTC GGCTATCTCG GCGTTCAGAC GGCCTTTACC CTGCAAAGCT 

201 CGCAGATGAG CCGCATTTTT CAAACGCTAG GCGCAGACCC GCACAATTTG 

251 GGCTGGTTTT TCATCCTGCC GCCGCTGGCG GGGATGCTGG TTCAGCCGAT 

301 AGTGGCTACT ACTCAGACCG CACTTGGAAG CCGCGCTTGG GCGGCCGCCG 

351 CCTGCCGTAT CTGCTTTACG GCACGCTGAT TGCGGTCATC GTGATGATTT 

401 TGATGCCGAA CTCGGGCAGC TTCGGTTTCG GCTATGCGTC GCTGGCGGCC 

451 TTGTCGTTCG GCGCGCTGAT GATTGCGCTG TTGGACGTGT CGTCGAATAT 

501 GGCGATGCAG CCGTTTAAGA TGATGGTCGG CGATATGGTC AACGAGGAGC 

551 AGAAAAGCTA CGCCTACGGG ATTCAAAGTT TCTTAGCGAA TACGGACGCG 

601 GTTGTGGCAG CGATTCTGCC GTTTGTGTTC GCGTATATCG GTTTGGCGAA 

651 CACTGCCGAG AAAGGCGTTG TGCCACAAAC CGTGGTCGTA GCATTCTATG 

701 TGGGTGCGGC GTTACTGATT ATTACCAGTG CGTTCACAAT CTCCAAAGTC 

751 AAAGAATACG ACCCGGAAAC CTACGCCCGT TACCACGGCA TCGATGTCGC 

801 CGCGAATCAG GAA7VAAGCCA ACTGGTTCGA ACTCTTAAAA ACCGCGCCTA 

851 AAGTGTTTTG GACGGTTACT CCGGTACAGT TTTTCTGCTG GTTCGCCTTC 

901 CGGTATATGT GGACTTACTC GGCAGGCGCG ATTGCAGAAA ACGTCTGGCA 

951 CACTACCGAT GCGTCTTCCG TAGGCCATCA GGAGGCGGGC AACCGGTACG 

1001 GCGTTTTGGC GGCGGTGTAG 

This encodes a protein having amino acid sequence <SEQ ID 154>: 

1 MIGDRRAGNH FGFSKANTFQ lECKKDLLYVG lYASNSKTRF ARAGKKHDLD 

51 VELRLSRRSD GLYPAKLADE PHFSNARRRP AQFGLVFHPA AAGGDAGSAD 

101 SGYYSDRTWK PRLGGR RLPY LLYGTLIAVI VMIL MPNSGS FGFGY ASLAA 

151 LSFGALMIAL LDV SSNMAMQ PFKMMVGDMV NEEQKSYAYG IQSFLANTDA 

201 WAAILPFVF AYIGLAN TAE KGWPQT WV AFYVGAALLI ITSA FTISKV 

251 KEYDPETYAR YHGIDVAANQ EKANWFELLK TAPKVFWTVT PVQFFCWFAF 

301 RYMWTYSAGA lAENVWHTTD ASSVGHQEAG NRYGVLAAV* 

0RF16ng and 0RF16-1 show 89.3% identity in 261 aa overlap: 

30 40 50 60 70 80 

MLSFGFLGVQTAFTLQSSC^SRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

I :: I I I M : |: I I I I I 

DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 
50 60 70 80 90 100 

90 100 110 120 130 140 

WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
I II I II 1 I I II I I I I I I I I I I I I II I II I I I I I t I I I I II I It I I I I M II I M I II II i 
WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
110 120 130 140 150 160 



orf 16-1. pep 
orf I6ng 

orf 16-1. pep 
orf 16ng 
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150 160 170 180 190 200 

orf 16-1. pep MQPFKMMVGDMVNEEQKGYAYGIQSFIANTGAWAAILPEVFAYIGLANTAEKGVVPQTV 
M 1 M i I I t I I I I I M I : I I I i I I t I ! t I i I t I n I I I t I I I I t I I I I It M I I I I I I I 
orfl6ng MQPFKMMVGDMVNEEQKSYAYGIQSFLANTDAWAAILPEVFAYIGLANTAEKGWPQTV 
170 180 190 200 210 220 



210 220 230 240 250 260 

WAETVGAALLVITSAFTIFKVKEYDPETYARYHGIDVAANQEKANWIELLKTAPKAFWT 
I I t I I I I i t M: I 1 I I t I I lllilltllM)lllllilinillil:lllltlll:|M 
WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 
230 240 250 260 270 280 

270 280 290 300 310 320 

VTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 
M I I M I I I I I : I I t I I I I I I I I I I I I M t I I I I I I : i I I i I I M I I t I 
VTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 
290 300 310 320 330 340 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from Kmeningitidis and Kgonorrhoeae^ and 
their q)itopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



orf 16-1. pep 
orf 16ng 

orf 16-1 .pep 
orfl6ng 



Example 19 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 155>: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT. . . 

This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 



1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 
51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQXVXDT 
101 PSYXCHQALP VKLGSXGSQN. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

401 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

701 ATGCCGCCCG CAAATGA 

This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 



1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 
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201 KLFANILYTP P FLILDAAGA VLALPAAAL G AWDAARK* 

Computer analysis of this amino acid sequrace gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A ofN, 



meningitidis: 



orf28.pep 
orf28a 

orf28.pep 
orf28a 

orf28a 



10 20 30 40 50 60 

MLETIKTTAAVLAHTLMLNG CTLMLWGMNNPVSETITRKHVXKDQIRXFGVVAEDNAQLEK 
IIIIMIIIItl llltllll:|:|lll:[ tli :ltll IMII IIIMIMMIII 
MLFRKTTAAVlAATIJdl^G CrrVMMWGMNSPFSETTARKHVDKDQIRAFGVVAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 120 

GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 

I I i I ! I I I M I t t I [ I I I I I Mil MttI 11:1 :l : :|llllll I :ill 
GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 
70 80 90 100 110 

FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 



The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 

301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

351 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT ACCGACAGAC 

401 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 

451 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CTUWiTCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 

651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 160>: 



1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

201 LFENIAYTPT TL ILDAVGAV LALPVAALIA ATNSSDK* 

ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 



10 20 30 40 50 60 

orf 28a . pep MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 
lllllllllllllllllllll:|:illl:l III : I II I I II I I II I II t I M I II I I I 
or f 2 8 - 1 MLFRKTTAAVLAATLMLNGCTLMLWGMNN PVSET ITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 



70 80 90 100 110 119 

orf 28a . pep GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 
II I I I It II I I I I t I II t It II I I I t II I I t I 11:11 : I : I : 1 M II i I I I I : I I I 
orf 28-1 GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 28a. pep FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
Mlllllllllll:lllllllill lllhllllllliMlltllllllllllllllill 
orf 28-1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 
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10 



15 



180 190 200 210 220 230 

orf 28a . pep EQSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDBQC 
I I I 1 I I I I I I I I I :: I I I I I t I I il ill 11111:1111111:111 I::::: II 
orf 28-1 EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

190 200 210 220 230 

Homology with a predicted ORF from N.2onorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from K 
gonorrhoeae: 



orf 28. pep 
orf2Bng 
orf 2 8. pep 
orf28ng 



MLFRKTTAAVLAHTLMLNGCTLMLWGMNNPVSETITRKHVXKDQIRXEX3WAEDNAQLEK 60 
ilMIIMIIII lhltlll:ll 1111111:111111! Mill lilllllllllll 

MLFRKTTAAVIAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGVVAEDNAQLEK 60 

GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 
I t I I I If M I II : II I I I I i M : I I I II II II II t I I I I i I I II II : : II I i 

GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 120 



The complete length ORF28ng nucleotide sequence <SEQ ID 161> is 



20 



25 



30 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGTTGTTCC 
GAACGGCTGT 
CAATCACCCG 
GTTGCCGAAG 
CGGGAAATAC 
GCCTTTTGAA 
CCGAGCTATG 
CAGCCAGAAT 
GACCTGACGA 
CTCGACAATC 
CTACGCCACG 
TGCCCGCCGA 
AAGCTGTTTG 
GGCGGCCGCG 
CCTCAGACTVA 



GT/UVAACGAC 
ACGATGATGT 
CAAACACGTT 
ACAATGCCCA 
TGGTTCGCCG 
GGCCGGGTTG 
CCCGCCACCA 
TTCAGTACCG 
CATCGCCAAG 
GGACCATTTA 
CCGCAAAAAC 
TATTTATTAT 
GAT^TATCTT 
GTGCTGGTCT 
ATGA 



CGCCGCCGTT 
TGCGGGGGAT 
GACAAAGACC 
ATTGGAAAAG 
TCAATCCCGA 
GACAAGCCCT 
AGCCCTGCCG 
GAGGTCTTTG 
CTGAAACAGC 
CACGCGCTGC 
TGAACGCCGA 
ACGGTTACTG 
ATATACGCCC 
TGCCTATGGC 



TTGGCGGCAA 
GAACAACCCG 
AAATCCGCGC 
GGCAGCCTGG 
AGATTCGGCG 
TCCAAATAGT 
GTCAAATTCG 
CCTGCGCTAT 
TTGAGTTTAA 
GTATCCGCCA 
TTATCATTTT 
AAAAACATAC 
CCCTTGTTGA 
TCTGATTGCA 



CCTTGATACT 
GTCAGCCAAA 
CTTCGGTGTG 
TGATGATGGG 
AAGCTGACGG 
TGAGGATACC 
AAGCGCCCGG 
GATACCGGCA 
AGCGGTCAAA 
AAGGCAAATA 
GAGCAAAGTG 
CGACAAATCC 
TATTGGATGC 
GCCGCGAATT 



35 



This encodes a protein having amino acid sequence <SEQ ID 162>: 

1 MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP P LLILDAAAA VLVLPMALIA 7VANSSDK* 

40 ORF28ng and ORF28-1 share 90.0% identity in 23 1 aa overlap: 

10 20 30 40 50 60 

orf 2 8- 1 . pep MLFRKTTAAVIAATLMLNGCTLMLWQ^NNPVSETITRKHVDKDQIRAFGVVAEDNAQLEK 
I I I I I I I I I I I I I I I : I II I I : I I I I I I I I I : I t II II I i I I II t II I t I I M II II I I 
orf28ng MLFRKTTAAVLAATLILNGCTMMLR04NNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 28-1 . pep GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 
I I t I I i I! I I I I : II I II t I M II : I I I I I I II I II I II I I I I I I I II I I I I : I : I I I I I 
orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 28-1 . pep FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRT I YTRCVSAKGKYYAT PQKLNADYHF 
III IIMIIII :l lllllllt I:|lllllllllllllllllltlllllllllllll 
orf28ng FSTGGLCLRYDTGRPDDIAKLKQLEFKAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 28-1 . pep EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

I I t I I I I I I II I I I : I II I I I 1 I : 1 II I I I 1 : I i I I I I : I I I : I I I : : I : 
orf28ng EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 



45 



50 



55 



60 
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190 200 210 220 230 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6 A shows the results of aflSnity purification of the GST-fiision protein, and Figure 6B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fiision protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a siuface-exposed protein, and that it may be a usefiil immtmogen. 

Example 20 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 163>: 

1 . .GTCAGTCCTG TACTGCCTAT TACACACG7\A CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT. . 

This conresponds to the amino acid sequence <SEQ ID 164; ORF29>: 

1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSOFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 
101 TKTSIVPQAP FSDRWLEENA GAASG. . 

Further work revealed the complete nucleotide sequence <SEQ ID 165>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGC7UVGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 

901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 
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1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

1401 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

1451 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 



1 MNLPIQKFMM LFAAAISLLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD lYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLTPNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

451 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A ofN. 



meningitidis: 



10 20 30 

orf 29 . pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 

|:|:||lil)[|IMI:||||||lillltl 
orf 29a EPGGKYHLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHE 
50 60 70 80 90 100 



40 50 60 70 80 90 

orf 29 . pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
|||il|:||lillltllllllllllllMtll lltltll mil:: lllilMtlil 
orf 29a VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 



100 110 120 

orf 29 . pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 
IIIIIMMI::ttl:ltllllll:t)ltltll 
orf 2 9a XXYVKGTSTBCTKSNIVPRAPFSDRWLKENAGA7VSGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 



orf 29a MDDIRGIVQGAVNPFLMGFQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLA 
230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 



1 ATGAATTNGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 



wo 99/24578 



-145- 



PCT/IB98/01665 



951 TGCCCTTGCC GTAGCAGANG CCGCAACTAC GGTTTGGGGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC NGGCTATAAN 

1051 ACACCTGCTG TTCGCACCAT GCATACTTTG GATGGGGAAA TGGCCGGTGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCCAA CAGCAAAGCA GATGCTTCCA 

1151 CACAACCGTC TTTACAAGCG CAACTAATTG GAGAACAAAT TANNNNNGGG 

1201 CATGCTTATA ACAAGCATGT CATAAGACAA CAAGAATTTA CGGATTTAAA 

1251 TATCAATTCA CCAGCAGATT TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 

1301 ATCCANCAAA TATGAAAGAG TTACCTCGCG GTAGAACTGC GTATTGGGAT 

1351 NATAAAACAG GGACNATAGT TATCCGAGAT AAAAATTCTG ACGATGGAGG 

1401 TACAGCATTT AGACCAACAT CAGGTAAAAA ATATTATGAT GATTTATAG 

This encodes a protein having anmio acid sequence <SEQ ID 168>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNXPIQKFMM LFAAAISXLQ IPISHANGLD 



FX3NARGSVKN 
HEVHSPFDNH 
DYPPPGGARD 
RADEAGKLIW 
DSAVSPVTDT 
ARQWADAHPN 
TPAVRTMHTL 
HAYNKHVIRQ 
XKTGTIVIRD 



RVYAVQTFDA 
DSKSTSDFSG 
lYXXYVKGTS 
ESDPNKNWWA 
AAQQTLQGXN 
ITATAQTALA 
DGEMAGGNRP 
QEFTDLNINS 
KNSDDGGTAF 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKSNIVPR 
NRMDDIRGIV 
HLGXLSPEAQ 
VAXAATTVWG 
PKSITSNSKA 
PADFARHIEN 
RPTSGKKYYD 



ARLRDDMQAK 
HERTGFEGII 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLMG 
LAAATALQDS 
GKKVELNPTK 
DASTQPSLQA 
IVSHPXNMPCE 
DL* 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGS 
NAGAASGFFS 
FQGVGIGAIT 
AFAVKDGINS 
WDWVBCNTGYX 
QLIGEQIXXG 
LPRGRTAYWD 



ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 



10 20 30 40 50 60 

orf29a.pep MNXPIQKFMMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
ii lillllllllllll IIMIIIIIIIttllllllMIIMIIIIIIlinilllll: 
orf29-l MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 29a . pep RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
t I I t I I I M I I I I : I : I I t I i I t i I I t I : I t I I I i I t M I i i I t M I I : t I I I 1 I i I i I I 
orf 29-1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 2 9a . pep gVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 
I I t t I M I M I t I I M I I t I I I I I I I I I I I I I I I I I I [) I I I lilllllllt:llll: 
orf 29-1 GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 2 9a . pep APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 
t I I i M I I I I t i I I I I I M I t I I I I I t I I I I I I It I t 1 I I I I i I i : I I I I I I I I I I I I M 
orf 29-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 29a . pep FQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDSAFAVKDGINS 
I I I I I I M I I It I I i I I I I I i I I I I I I I I It I t i II t t I II : I I I t i II II t I I I i 
orf 2 9-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 . 320 330 340 350 360 

or f 2 9a . pep ARQWADAHPNITATAQTAIAVAXAATTVWGGKKVEUJ PTKWDWVKNTGYXTPAVRTMHTL 

|:|||llllllltltittl::| tl III I t I 1 I II t I I II I t I I I i I ll:t l:lt 
orf 2 9- 1 AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPTU^RHMQTL 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 29a . pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

I I I t I t t t : t I 1 : III: I 
orf 29-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from A^. 
gonorrhoeae: 

orf29.pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 30 

I : t: I I t t I I [ I t I I I I I I I I M I f I I I I I 
orf29ng EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 

orf 2 9 . pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDI Y 90 

lllllhlllllllltllllMilllllllll lllttll l[)tl:: llllliltlll 
orf29ng VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 162 

orf 29. pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 125 

: t t I I I I I ) I I I : I I M I 1 i t 
orf29ng SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 222 

The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 170>: 

1 MNLPIQKFMM LFAAAISLLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYCKSPQGG 

151 GYPPPGGARD lYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGLGVGAIT 

251 DSAVSPVTYA AARKTLQGIH NLGNLSPEAQ LAAATALQDS AFAVKDSINS 

301 ARQWADAHPN ITATAQTALA VTEAATTVWG GKKVELNPAK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNKP LESKNTVTTN NFFENTGYTE KVLRQASNGD 

401 YHGFPQSVDA FSENGTVIQI VGGDNIVRHK LYIPGSYKGK DGNFEYIREA 

451 DGKINHRLFV PNQQLPEK* 

In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 



1 atgAATTTGC CTATTCAAAA ATTCATGATG ctgttggcAg cggcaatatc 

51 gatgctGCat ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGCAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTGCG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGTGTTATC GGCTATGAAA CCCATTTTTC AGGACACGGA 

301 CACGAAGTAC ACAGTCCGTT CGATAATCAT GATTCA7\AAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GCGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATACATCCC GCAGACGGAT ATGACGGGCC TCAAGGCGGC 

451 GGTTATCCGG AACCACAAGG GGCAAGGGAT ATATACAGCT ACCATATCAA 

501 AGGAACTTCA ACCAAAACAA AGATAAACAC TGTTCCGCAA GCCCCTTTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCTTCCGG TTTTCTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAACGACC CCGATAAAAA 

651 TTGGCGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAACGGGT TTTCAAGGGG TAGGGATTGG GGCAATTACA 

751 GACAGTGCGG TAAGCCCGGT CACAGATACA GCCGCTCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 

851 CGAGCCTATT ACAGGACAGT GCCTTTGCGG TAAAAGACGG CATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCG/^T ATAACAGCAA CAGCCCAAAC 

951 TGCCCTTGCC GTAGCAGAGG CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC CGGCTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTGTA GATGGGGAGA TGGCAGGGGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCGGA AGGAAAAGCT AATGCTGCAA 

1151 CCTATCCTAA GTTGGTTAAT CAGCTAAATG AGCAAAACTT AAATAACATT 

1201 GCGGCTCAAG ATCCAAGATT GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 

1251 TTTTCCAATA GGAACTGCAA CTTATGAAGA GGCAGATAGA CTAGGTAAAA 

1301 TTTGGGTTGG TGAGGGTGCA AGACAAACTA GTGGAGGCGG ATGGTTAAGT 

1351 AGAGATGGCA CTCGACAATA TCGGCCACCA ACAGAAAAAA AATCACAATT 

1401 TGCAACTACA GGTATTCAAG CAAATTTTGA AACTTATACT ATTGATTCAA 

1451 ATGAAAAAAG TWVTAAAATT AAAAATGGAC ATTTAAATAT TAGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 172; ORF29ng-l>: 

1 biNLPIQKFMM LLAAAISMLH IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 
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101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP Ai:)GYDGPQGG 

151 GYPEPQGARD lYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

401 AAQDPRLSLA IHEGJCKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 

451 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 



ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 



10 10 20 30 40 50 60 

orf 2 9ng-l . pep MNLPIQKFMMLIAAAISMLHIPISHAKGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
IIMIiltlM:|IIM:|:|IIIIMIIIIIIIIItlllttlll[illl[llllllll: 
orf 29-1 MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

15 

70 80 90 100 110 120 

orf29ng-l.pep RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
II lllllllfll:l:lilllllllltllltllltlllllllllllll:ltMIIIIIII 
or f 2 9- 1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 
20 70 80 90 100 110 120 



130 140 150 160 170 180 

orf29ng-l.pep GVt>GGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 
I I II II II I ) I t I It M I I I llllltll: It I I 1 I 1 I I II :: I I I It t 1 f I III 
25 orf 29-1 GVDGGETVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 29ng-l . pep APFSDRWLKENAGAASGFLSRADEAGiCLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 
30 I I I 1 I II I II I I f I I I I I : t I I I I I t t t I I I : t I : I 1 I I I I I I I : I 1 I I I I I I I I I I I 

orf 29-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 



250 260 270 280 290 300 

35 orf 29ng-l .pep FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 

M I I I t t I It I I I I I I 1 I I t I I t I I 1 11 I t I I I : t I I I I I I i It t I I I I I I I t I I I I I I I 
orf 29-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 
250 260 270 280 290 300 



40 



45 



50 



310 320 330 340 350 360 

orf 29ng-l . pep ARQWADAHPNITATAQTAIAVAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPAARHMQTV 
t:|||lllllttlllllll::llllltllllllllllllllllltllllllttltHII: 
or f 2 9- 1 AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 419 

orf 29ng-l . pep DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQUJEQNLNNIAAQDPRLSLAIHEGKKNFP 

I I I n III : I I I : : I : : :: |: :: : ::::: 
orf 2 9-1 DGEMAGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVKT 

370 380 390 400 410 420 



420 430 440 450 460 470 479 

orf 29ng-l . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 

55 orf 29-1 RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 

430 440 450 460 470 480 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins Scorn N.meningitidis and N.gonorrhoeae^ and their epitopes, 
60 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 21 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 173>: 
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1 ATGAAAAT^AC 7VAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

This corresponds to the amino acid sequence <SEQ ID 174; ORF30>: 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 175>: 



1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGA/UVGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1>: 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF jfrom Kmeninsitidis (strain A) 

ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A ofN, 
meningitidis: 

10 20 30 40 

orf 30 . pep MKKQITAAVMMLSMIAP7^4A NGLDNQAFEDQMFHTRADAPMQ 
I i I t I M I ( I I I I I I I I I I I I I I I I I I I t M : t I I I I 1 I I i i 
orf 30a MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTX GAFLP 

10 20 30 40 50 60 



or f 3 Oa LXILGGAAIGMWT QHGFSYATTGRPASVRDVAIAGGLGAI PGXVGAAGKWS FAKYGE^ I 

70 80 90 100 110 120 

The complete length ORF30a nucleotide sequence <SEQ ID 1 77> is: 



1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGANA CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGN AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 178>: 



1 MKKQITAAVM MLSMIAPAMA' NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 



or f 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 
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10 



15 



orf30-l 

orf30a,pep 

orf30-l 

orf30a.pep 

orf30-l 



llllltltlllllltilMtlllMlllltltltllllllllllllllillf I illll 
MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 



60 



LXILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGXVGAAGKWSFAKYGREI 120 

I I I I I I M I i I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I [ I I I I I I I I I I i I 

LAI LGGAAIGMWTQHGFS YATTGRPAS VRDVAI AGGLGAI PGGVGAAGKW S FAKYGRE 1120 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 
I I M I I I I M i I I I I I I 1 [ I I I I I I I I I t I I I I t I I t I t I I I I I I I I I I I M I I I I I ) I I 
KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 



FX 
I 1 
FX 



20 



orf30a.pep 
orf30-l 

Homology with a predicted ORF from N.sonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from K 
gonorrhoeae: 

orf 30 . pep MKKQIT7UWMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 42 

llllllililllllllllllillMI)llll:i)IIMII)] 
orf30ng MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 



25 



30 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAAAAAC 
CGCAATGGCA 
ACACGCGGGC 
ATGAAGGAGA 
TGCCATTGGT 
GACCAGCTTC 
GATGTAGGTG 
GATTAAAATC 
GTCATCCTAT 
ACGGGCAAGA 
ATCAAAATCT 



AAATCACCGC 
AACGGATTGG 
AGATGCGCCG 
CTGAAGGGGC 
ATGTGGACAC 
TGTTAGAGAT 
CTGCAGGAAA 
GGCAATAATA 
TGGAAAATTT 
CTTTGCCTGG 
ACGGACAGAT 



AGCCGTAATG 
ACAATCAGGC 
ATGCAGTTGG 
TTTTCTTCCA 
AGCATGGTTT 
GTTGCTGGCG 
GGTTGTTTCC 
TGCGGATAGC 
CCCCATTATC 
ACAGGGAATT 
CATGGAAAAA 



ATGCTGTCTA 
ATTTGAAGAC 
CGGAGCTTTC 
TTGGCTATCT 
TAGTTATGCA 
GATTAGGCGC 
TTTGCTAAAT 
CCCTTTCGGT 
ATCGTCGAGT 
GGTCGTCATC 
CCGCTTCTAA 



TGATCGCCCC 
CAAGTGTTCC 
TCAGAAGGAG 
TGGGTGGTGC 
ACGACAGGCA 
AATTCCTGGT 
ATGGACGTGA 
AATAGAACAG 
TACGGATAAT 
GCCCTTGGGA 



35 



40 



45 



50 



55 



60 



This encodes a protein havmg amino acid sequence <SEQ ID 180>: 

1 MKKQITAAVM MLSMIAPAM A NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGRE IKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap; 

10 20 30 40 50 60 

orf 30ng . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 
It I I I I M I I I I I I i I I I I I I I i I I I I I t I I i t i i M I M I I I M I i 1 I I I I I I I I I I I t 
orf 30-1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

10 20 30 40 50 60 

70 80 90 100 110 

orf 30ng . pep LAILGGAAIGMWTQHGFSYATTGRPASVRDVA — GGLGAIPGDVGAAGKWS FAKYGRE I 
t I I I t I I I ! I I I I I M I i I i I I t I t I I I I I I I t t I I I I I i [ I I I I I I I I I I I I I I I I 
orf 30-1 LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGBCWSFAKYGREI 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 30ng . pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
I I I I I I t I I I I i I I I M i I I It I I I I I I I I I I I I I I I I I I I I I I I t I I i I I I I I I I I I I I 
orf 30-1 KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

130 140 150 160 170 180 

180 

orf30ng.pep FX 
1 I 

orf30-l FX 
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Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins fcom N.meningitidis and Kgonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 22 

The following partial DNA sequence was identified in N Meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg.CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT.. 

This corresponds to the amino acid sequence <SEQ ID 182; 0RF31>: 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI.. 

Further work revealed a further partial nucleotide sequence <SEQ ID 1 83>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT.. 

This corresponds to the amino acid sequence <SEQ ID 184; 0RF31-1>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI.. 

Computer analysis of this amino acid sequence gave the following results: 
Homolopv with a predicted ORF from Ksonorrhoeae 

0RF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF31.ng) from A^. 
gonorrhoeae: 

orf 31 . pep MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 

I I I t I I t i t I M I I I I I I f i I t t I I I I I I I t I I I I I : : I t I [ ) II : : I 

orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

orf 31. pep SFSLLGFSLCLAVGTXNIAFADGI 84 

II 11111111:11 II II I i II 
Orf31ng CFSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSV 114 

The complete length 0RF31ng nucleotide sequence <SEQ ID 185> is: 

1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC T^AACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

4 01 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

4 51 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
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701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF KRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKTiAPKTCXJA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG lAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa overly with the pore-forming 
hemolysins-Iike HecA protein from Erwinia chrysanthemi (accession nimiber L39897): 



orfSlng 


96 


GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TCyrQLGGWIQGNPWLTRGE 


154 






GNG+P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 




HecA 


45 


GNGV PWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 


104 


Orf31ng 


155 ARVWNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 


214 




A ++N++ 3 + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 




HecA 


105 


AAAILNEWSPKRSRLAGYLEVAGQAANWVANPYGITCSGCGFLNTPRLTLTTGTPQFD 


164 


OrfBlng 


215 


-AGDFSGFKIRQGNAVIAGHGLDARDTDF 242 








AG SG +R G+ +1 G GLDA +D+ 




HecA 


165 


AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 





Furthermore, 0RF31ng and 0RF3 1-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 

I I I I I t I I t I t I I I I I I t t M I I I I I I I I i i I I I t M : : I I I I I M I : I 
MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAFC 

10 20 30 40 50 

70 80 
or f 3 1 -1 . pep FSLLGFSLCLAVGTANIAFADGI 

II I I I I M I I: ) I : I M I n I I 

orf31ng FSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSVN 
60 70 80 90 100 110 

On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis and KgonorrhoeaCy and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 23 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 187>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG.. 

This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 189>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 
101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 



orf 31-1. pep 
orf 31ng 
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151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATTATCC GCCGACACAA GCCGCTTTGG CTGTVATTGGG AATATTTGAG 

351 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

401 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 

451 TTGATACGCG AACGTGATTA CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 

501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAATCATCGA 

651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GACGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 

951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

1001 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA 7UVACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 

1101 rCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 



1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

151 LIRERDYCEA VRFEJTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR 

201 QAGSPMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHXYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETVSAHRR LSDDLNGGEA LSATQRLECW QTLQQHQNGW 

351 RQGAEDWSRY LFGQPSAPEK LAAFVSKHQK IR*w 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menin^tidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A ofN. 



meningitidis: 

10 20 30 40 50 60 

orf 32 . pep MNTPPFVCWI FCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

MUM i M M M M M I M M M M M I M M M n M M M I M M I M M N 
orf 32a mNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 



70 80 
orf 32 . pep CVHQDIHVRTWHSDAADIDTA 
MMMMMMMMMMI 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 

The complete length ORF32a nucleotide sequence <SEQ ID 191> is: 

1 ATGAATACTC CTCCTTTTTC TGCTGGANTT TTTTGCAAGG tcatcgacaa 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 

351 CGCGGAGGAN AGCAATGAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

401 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 

4 51 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 

501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 

651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 
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801 CCGCGGCGAA 

851 TTTGGCACAT 

901 GCCTTTTGGG 

951 ACACCGCCGC 

1001 CACAACGCCT 

1051 CGGCAAGGCG 

1101 ATCCGAAAAA 



GACAGTTTCG 
CTACCCGCAA 
ATAAGGCACA 
CTTTCAGACG 
CGAATGTTGG 
CGGAGGATTG 
CTCGCCGCCT 



TGCGCGCCCA 
GATGAGAATG 
CGGTTTCTAC 
ACCTCAACGG 
CAAATCCTGC 
GAGCCGTTAT 
TTGTTTCAAA 



GCTTGCGGGC 
TCCATCTCGA 
ACGCCCGAAA 
CGGAGAGGCT 
AACAACATCA 
CTTTTTGGGC 
GCATCAAAAA 



AAACCCTTCT 
CAAACTCCAC 
CCGCATCGGC 
TTATCCGCAA 
AAACGGCTGG 
AGCCTTCCGC 
ATACGCTAG 



This OTCodes a protein having amino acid sequence <SEQ ID 192>: 



10 



15 



1 MNTPPFSAGX 

51 ALCPDLPDVX 

101 HIIRRHKPLW 

151 LIRERDYCEA 

201 QAGSPLTLLL 

251 PQQDFDKLLH 

301 AFWDKAHGFY 

351 RQGAEDWSRY 



FCKVIDNFGD 
CVHQDIHVRT 
LXWEYLSAEX 
VRFDSGALRK 
AGAXIIDSLK 
LADCAVIRGE 
TPETASAHRR 
LFGQPSASEK 



IGVSWRLARV 
WHSDAADIDT 
SNERLHXMPS 
RU^LPEKNXP 
QNGVIPQDAL 
DSFVRAQLAG 
LSDDLNGGEA 
LAAFVSKHQK 



LHRELGWQVH 
APVXDWIET 
PQESVXKXFW 
EWLLFGYRSD 
QNDGDVFQTA 
KPFFWHIYPQ 
LSATQRLECW 
IR* 



LWTDDVSALR 
FACDLPENVL 
FMGFSEXSGG 
VWAKWLEMWR 
SVRLVKIPFV 
DENVHLDKLH 
QILCX2HQNGW 



ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

orf 32-1 . pep MNTPPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

I M 1 11 1 1 n M I I I i t I I I I I I I M 1 I I I t I I i I I i I I I I i t I I I I I t I I I i I I 
orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 32-1 . pep CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 

1 1 1 1 1 1 1 1 1 1 i I i 1 1 1 1 1 1 1 1 1 1 iiitiiiiiiiiiitiiiiiiiiiiii Minn 

orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 32-1 . pep SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 

WWW tnnni i nnf ni nnnnnnnnn ninninn 

orf 32 a SNERLHXMPS PQESVXKXFWFMGFSEXSGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 32-1 . pep EWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 

n n n n n n n n I M i I n n : n M n : n n n i : n n n n u n i n n i 

orf 32a ewllfgyrsdvwakwlemwrqagspltlllagaxiidslkqngvipqdalqndgdvfqta 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 32-1 , pep SVRLWIPFVPQQDFIXJLLHrAIXIAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

n n n n n n n n : n n t n n I n I n n n M I n n n n M n n n n n i 

orf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 32-1 . pep AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 

nnnnnnnnnnnnnnnnnnnni nnnnnnnnn 

orf 32a AFWDKAHGFYTPETASAHRRLSDDLNGGEALSATQRLECWQILQQHQNGWRQGAEDWSRY 
310 320 330 340 350 360 

370 380 
orf 32-1 . pep LFGQPSAPEKLAAFVSKHQKIRX 

n n n I n n n n n n n i 

orf 32a LFGQPSASEKLAAFVSKHQKIRX 
370 380 

Homology with a predicted ORF from N.^onorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N, 
gonorrhoeae: 



orf 32 . pep MNTPPF-VCWIFCBCVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

III t 1 11 1 n II II II n II n II n n II n n n II n I II II I II n n I n 
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orf32ng mVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 60 

orf 32 .pep DVPCVHQDIHVRTWHSDAADIDTA 81 

III I I I I I M M t I I I I I II II I 
orf32ng DVPFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 120 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 



1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV lETFACDLPE 

101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGDVWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

Further sequencing revealed the following DNA sequence <SEQ ID 195>: 



1 


. ATGAATACAT 


ACGCTTTTCC 


51 


CAATTTCGGC 


GACATCGGCG 


101 


GCGAACTCGG 


TTGGCAGGTG 


151 


CGCGCGCTTT 


GTCCCGATTT 


201 


TCATGTCCGC 


ACTTGGCATT 


251 


TTCCCGATGC 


CGTTATCGAA 


301 


CTGAACATCA 


TCCGCCGACA 


351 


GAGCGCGGAG 


GAAAGCAATG 


401 


AGGGCGTTCA 


AAAATATTTT 


451 


GGGTTGATAC 


GCGAACGCGA 


501 


AGCCCTGCGC 


CGGCGGCTGG 


551 


TGCTTTTCGG 


CTATCGGGGC 


601 


CAACAGGCAG 


GCAGCCTGAT 


651 


CGACAGCCTC 


AAACAAAGCG 


701 


aaggcgGTGT 


CTTTCagacG 


751 


GTGCcGCAAC 


AGGAcTTCGA 


801 


GATACGCGGC 


GAAGACAGTT 


851 


TTTTTTGGCA 


CATCTACCCG 


901 


CACGCCTTTT 


GGGATAAGGC 


951 


GGTGCACCGC 


CTCCTTTCGG 


1001 


CAACACAACG 


CCTCGAATGT 


1051 


TGGCGGCAAG 


GCGCGGAGGA 


1101 


CGCATCCGAA 


AAACTCGCCG 


1151 


AG 





This encodes a protein having amino 



TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 
TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 
CATTTGTGGA CGGACGACGT GTCCGCCTTG 
GCCCGATGTT CCCTTCGTTC ATCAGGATAT 
CCGATGCGGC AGACATTGAT ACCGCGCCCG 
ACTTTTGCCT GCGACCTGCC CGAAAATGTG 
CAAACCGCTT TGGCTGAATT GGGAATATTT 
AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 
TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 
TTACCGCGAA GCCGTCCGTT TCGATACCGA 
TGCTGCCCGA AAAAAACGCC CCCGAATGGC 
GATGTTTGGG CAAAGTGGCT GGACATGTGG 
GACCCTACTG CTGGCGGGGG CGCAAATTAT 
GCGTTATTCC GCAAAACGCC CTGCAAAAtg 
gcatccgTcC gccttGTCAA AAtcCCGTTC 
CAAATTGCTG CAcctcgcCG ACTGCGCCGT 
TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 
CAAGACGAGA ATGTCCATCT CGACAAACTC 
ATACGGCTTC TACACGCCCG AAACCGCATC 
ACGACCTCAA CGGCGGAGAG GCTTTATCCG 
TGGCAAACCC TGCAACAACA TCAAAACGGC 
TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 
CCTTTGTTTC AAAGCATCAA AAAATACGCT 



sequence <SEQ ID 196; ORF32ng-l>: 



1 MNTYAFPVCW IFCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

101 LNIIRRHKPL WLNWEYLSAE ESNERLHU^P SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

201 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR LLSDDLNGGE ALSATQRLEC WQTLQQHQNG 

351 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 



10 20 30 40 50 59 

orf 32-1 . pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
Ml I I I II 1 I I II I I I I I I I t I t I I I I I I I I II I II I I II I I I I I I I t I I I 1 I I I I i 
orf32ng-l MNTYAFPVCW I FCKV I DNFGD I GVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

10 20 30 40 50 60 



50 70 80 90 100 110 119 

orf 32-1 . pep PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAE 
I I I II I I I I I II M I t I I I n I I I I : I I I I I I I I I I I I II : I I I t I I II I I i I II I I I I 
orf32ng-l PFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLSAE 

70 80 90 100 110 120 



120 130 140 150 160 170 179 
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ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 
I I I I I I I I I I M I I I I I t I I I I I I I M I I I I i I I t t I I 

ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 
130 140 150 160 170 180 

180 190 200 210 220 230 239 

SEWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQT 
I I I I I I I I: I I M t I I : I I: i I I 1 I I t I t I I : I I I I I t I I I t I I I I : I I I I : t llll 
PEWLLFGYRGDVWAKWLDMWQQAGSLMTLLLAGAQI I DSLKQSGVI PQNTVLQNEGGVFQT 
190 200 210 220 230 240 

240 250 260 270 280 290 299 

ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 
I t I I I I I I I I I I I I I I I : I I I I I I [ I I I I I I I I I I I : t I I I I I I I I I i I I I I i I I I I t I I 
ASVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRTQLAGKPFFWHIYPQDENVHLDKL 
250 260 270 280 290 300 

300 310 320 330 340 350 359 

HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
I I t I I I I : I I I I M I: I :l I I I I I I I I I I I I I M i I i I I I I I i I I I i I I I I I I I i I I I I 
HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRL.ECWQTLQQHQNGWRQG7^DWSR 
310 320 330 340 350 360 

360 370 380 

YLFGQPSAPEKLAAFVSKHQKIRX 
I I I I 1 I I I I I M I I I 1 I I I I M I 
YLFGQPSASEKLAAFVSKHQKIRX 
370 380 



30 On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N.meningitidis and N.gonorrhoeaej and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
35 7A shows the results of afiBnity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.colu Purified His-fiision protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive resuh. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a usefiil immunogen. 



10 



20 



WO 99/24578 

orf 32-1 .pep 
orf32ng-l 

orf 32-1. pep 
orf32ng-l 



orf 32-1 .pep 
15 orf32ng-l 



orf 32-1, pep 
orf32ng-l 



25 orf 32-1. pep 

orf32ng-l 



Example 24 

40 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 197>: 

1 . . TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

45 201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

251 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

401 TCGCCTGCTA NGGCATCCTG CCGCGCCTG. . 

50 This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 



1 

51 
101 



.LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 199>: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 

51 AGGCGGTTTT ATTTTCAGCG GCGATCCCGT ACAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGGAGATG 

151 ATTGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG TGGCGGCGAC GTTTGCATTT TTTACCGGTT 

251 TTTCAGTCAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGTTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGTG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCA ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAA/VA CGGATTGGAT TTGGAAAAGC 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCACCGAAM TCATCTTGAA 

951 CGATGCGCCG AT^TGGGCGG TCATGCTGGA GACCGAGTGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 ACCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCG GACCGCGGCG 

1151 TGTTGCGGCA GATTGTCCGA CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGC GGCGCAGGAA GGGCGTTTGA AAGACCAATA A 

This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 



1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWWAATFAF FTGFS VTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFbJWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV lEGRLNGNIA DARAWSG LLV 

251 GSIACYGILP RLLAW WCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALAECGAA WLEPDRAAQE GRLKDQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meningjtidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A ofK 
meningitidis: 

10 20 30 

orf 33 . pep LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 

t I I 1 I I I I I I I I I I i I I I I I I I I M M I I 
orf 33a LMDNQGLN FFLVLAGVXGMNTLMLAVW LAMLFLRVKVGRFFSSPATWFRGKDPVNQAVLR 
90 100 110 120 130 140 



40 50 60 70 80 90 

or f 33 . pep LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLSNAASVRA 

II Mill lltlll IIIIIIIIMIItllllMllllllllllllllllt::::III 
orf 33a LYADEWRXPSVRWKIGATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLGDSSSVRL 
150 160 170 180 190 200 



100 110 120 130 140 

orf 33 .pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXGILPRL 
I I I I II I I : I I I t I I I I I I : I I I I I II I I I II I I I It I I I I I I I I lltlll 
orf 33a VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLAW AVCK 
210 220 230 240 250 260 



orf 33a 



ILXXTSENGLDLEKXXXXXXIRRWQNKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 
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The complete length ORF33a nucleotide sequence <SEQ ID 201> is: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGAAGA 

51 AGGCGGCTTT ATTTTCAGCG GCGATCCCGT GCAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAT^AAA TCATCCGTCG GGCGAAGATG 

151 ATCGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG CGGCGGCGAC GTTTGCGTTT NTTACCGNTT 

251 TTTCAGTTAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGNTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGCG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GACCCTGTCA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCN ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGGGCGATT CGTCTTCGGT ACGGCTGGTG GAAATGTTGG CATGGCTGCC 

651 TGCGAAACTG GGTTTTCCCG TGCCTGATGC GCGGGCGGTC ATCGAAGGTC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGCGGT 

801 ATGCAAAATC CTTNTGNAAA CAAGCGAAAA CGGCTTGGAT TTGGAAAAGC 

851 NCNNNNNTCN NNCGNTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCCGAAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAATGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCC GACCGCGGCG 

1151 TGTTGCGGCA GATCGTCCGA CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCANCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTGGAAC 

1301 CCGACAGAGC GGCGCAGGAA GGCCGTCTGA AAACCAACGA CCGCACTTGA 

This encodes a protein having amino acid sequence <SEQ ID 202>: 



1 MLNPSRKLVE 
51 IDRNRMLRET 
101 VLAGVXGMNT 



LVRILEEGGF 
LERVRAGSFW 
LMLAVWLAML 



151 YADEWRXPSV 
201 LGDSSSVRLV 
251 GSIACYGILP 



RWKIGATSHS 
EMLAWLPAKL 
RLLAWAVCKI 



301 DTRRETVSAV 
351 ANREQVAALE 
401 VXLLAEQGLS 



SPKIVLNDAP 
TELKQKPAQL 
DDLSEKLEHW 



IFSGDPVQAT 
LWVAAATFAF 



EALRRVDGST 
XTXFSVTYLL 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



GFPVPDARAV 
LXXTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALTECGAA 



lEGRLKGNIA 
LEKXXXXXXI 
QDGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAKM 
MDNQGLNFFL 
DPVNQAVLRL 
QYTFNWESTL 
PARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKTNDRT* 



ORF33a and ORF33-1 show 94.1% identity in 444 aa overly: 



10 20 30 40 50 60 

orf33a.pep MLNPSRKLVELVRILEEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAKMIDRNRMLRET 
lllillllilillll:IMIIIIIIIIIIIIIIIIIIIItllltilM:|ltilll[i)l 
orf33-l MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 33a . pep LERVRAGSFWLWVAAATFAFXTXFSVTYLmDNQGLNFFLVLAGVXGMNTLMLAVWLAML 
1 1 i 1 1 1 1 i 1 1 n t : i I [ ) I I I I i I I i I I I I I I I t I I I I [ I i I I I M I I I I I I I I I I I 
or f 33- 1 LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 33a . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 
I M i I I M I I 1 I I I i I [ I I I I I I t t I I I I I I I i I i I I I I t M I I i I I I M M I I I I t I I 
orf 33-1 FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 33a . pep VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 
I M i i It I 1 I I I I I I I I I I t t : : : : t I I I I I M I I I : i I M I I I I I i i I I I I I M I 1 t i 
orf 33-1 VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 



orf 3 3a. pep 



250 260 270 280 290 300 

DARAWSGLLVGSXACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 
I I M I I 1 t I i M I I I I I i i t I t I I 1 : 1! I I I t I I I I M I 1 I ) I i I I I I I M I 
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orf33-l DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 3 3a . pep DTRRETVSAVSPKI VLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 

I I I I I I t I I I I I i I : t i I I I I I I I It t I I I t t I I I I t I t I t I I I I I I I I I : M I I I I I I I 
or f 3 3 - 1 DTRRETVSAVSPKI I LNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf33a.pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 
I i I I I I i I I i I I I I I I I I i I I I I I I I I t I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I 
orf33-l TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 

430 440 450 

or f 33a . pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

I I I I: I I I i i I I I I I I I I I I I I I t 
o r f 3 3 - 1 RNALAECGAAWLEPDRAAQEGRLKDQX 

430 440 

Homology with a predicted ORF from N.^onorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from N, 



25 



30 



35 



gonorrhoeae: 

orf33.pep 
orf33ng 
orf33 .pep 
orf 33ng 
orf 33 . pep 
orf 33ng 



LFLRVKVGRFFS S PATWFRXKDPVNQAVLR 3 0 
I I I M I t I M I I I I I I i i I I I t I I I I t t 

LMDNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 100 

LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 90 
II t : I i I I I I t I I I : t M I I I I II I I I I M I i i I II I t II II I I I i II i i I t I I I i 

LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 143 

||||llllllllilllll[:llllinilllllllllMII \\'\ ttllll 

VEMLAWLPSKLGFPVPDT^VIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 220 



An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 



acid sequence <SEQ ID 204>: 



40 



45 



1 

51 
101 
151 
201 
251 
301 
351 



MIDRDRMLRD 
LVLAGVLGMN 



TLERVRAGSF_ 
TLMLAVWLAT 



WLWWVASMM FTAGFSGTYL 



LYADQWRQPS 
LLSNAASVRA 



VRWKIGATAH 
VEMLAWLPSK 



LFLRVKVGRF 
SLWLCTLLGM 



FSSPATWFRG 
LVSVLLLLLV 



VGSIVCYGIL 
ADTRRETVSA 
AANREQVAAL 
WQLLAEQGL 



PRLLAWWCK 
VSPKIVLNDA 
ETELKQKPAQ 
SDDLSEKLEH 



LGFPVPDARA 
ILLKTSENGL 
PKWALMLETE 
LLIGVRAQTV 
WRNALTECGA 



VIEGRLNGNI 
DLEKTYYQAV 
WQDGQWFEGR 
PDRGVLRQIV 
AWLEPDRVAQ 



LMDNQGLNFF 
KGPVNQAVLR 
RQYTFNWEST 
ADARAWSGLL 
IRRWQNKITD 
LAQEWLDKGV 
RLSEAAQGGA 
EGRLKDQ* 



Further sequence analysis revealed the following DNA sequence <SEQ ED 205>: 



50 



55 



60 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTTGaatC 
agggggtTTT 
gccgcgtgga 
atcgACAGGg 
gtcgtTctgG 
TTTCAGgcac 
GTTTTggcgG 
gGCAACGTTG 
CGACGTGGTT 
TATGCGGACC 
GGCGCACAGC 
TGCTGCTGCT 
TTGAGCAATG 
GTCGAAACTC 
GTCTGAACGG 
GGCAGTATCG 



CATCCCgaAA 
attttcagcg 
cggcAGTACG 
accgtatgtt 
TTATGGGTGG 
ttatcttCTG 
GAGTGTtggG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCA 
TTGTGGCTCT 
TTTGGTGCGG 
CCGCTTCGGT 
GGTTTCCCTG 
CAATATTGCC 
TCTGCTACGG 



ACTGgttgag 
gcgatcctgt 
GAggAaaaaa 
gcgggACaCg 
TggtggCAtC 
ATGGACaatC 
CATGaatacG 
TGAAAGTGGG 
GGCCCTGTM 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 



ctGgTCCgtA 
gcaggcgacg 
tcttccgtcg 
TtggaacGTG 
gATGATGTtt 
AGGGGCtGAA 
CtgATGCTGG 
ACGGTTTTTC 
ATCAGGCGGT 
CGATGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 



Ttttgaataa 
gaggctttgc 
GGCGGAGAtg 
TGCGTGCggg 
aCCGCCGGAT 
TtTCTTTTTA 
CAGTATGGtt 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGTAGT 
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801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



GTGTAAAATC 
CCTATTATCA 
GATACGCGTC 
CGATGCGCCG 
AATGGTTCGA 
GCCAATCGGG 
GGCGCAACTG 
TGCTGCGGCA 
GTGCAGCTTT 
GGAACATTGG 
CTGACAGGGT 



CTTTTGAAAA 
GGCGGTCATC 
GGGAAACCGT 
AAATGGGCGC 
GGGCAGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GATTGTGCGG 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TACGCGCCCA 
CTTTCGGAAG 
GGGGCTTTCA 
TGACCGAATG 
GGCCGTTTGA 



CGGattgGAT 
AGAACAAAAT 
TCGCcgaAAA 
GACCGAGTGG 
GGCTGGATAA 
ACAGAGCTGA 
7WVCTGTGCCG 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAGACCAATA 



TTGGAAAAAA 
CACCGATGCG 
TCGTCTTGAA 
CAGGACGGCC 
GGGCGTTGCC 
AGCAGAAACC 
GACCGGGGCG 
CGGCGCGGTG 
CGGAA7VAGCT 
TGGCTTGAGC 
A 



This encodes a protein having amino acid sequence <SEQ ID 206; ORF33ng-l>: 



15 



20 



1 MLNPSRKLVE 

51 IDRDRMLRDT 

101 VLAGVLGMNT 

151 YADQWRQPSV 

201 LSNAASVRAV 

251 GSIVCYGILP 

301 DTRRETVSAV 

351 ANREQVAALE 

401 VQLLAEQGLS 



LVRILNKGGF 
LERVRAGSFW 
LMLAVWLATL 



IFSGDPVQAT 
LWWVASMMF 



EALRRVDGST 
TAGFSGTYLL 



RWKIGATAHS 
EMLAWLPSKL 
RLLAWWCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



SPKIVLNDAP 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LLKTSENGLD 
KWALMLETEW 
LIGVRAQTVP 
RNALTECGAA 



lEGRLNGNIA 
LEKTYYQAVI 
QDGQWFEGRL 
DRGVLRQIVR 
WLEPDRVAQE 



EEKIFRRAEM 
MDNQGLNFEX 
GPVNQAVLRL 
QYTFNWESTL 
DARAWSG LLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKDQ* 



ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 



25 



10 20 30 40 50 60 

orf 33-1 . pep MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
|||||inillfMI::IUIIIIIttlllllltllllIlllil:lll]lllt:llll:1 
orf33ng-l MLNPSRKLVELVRI LNKGGFI FSGDPVQATEALRRVDGSTEEK I FRRAEMI DRDRMLRDT 

10 20 30 40 50 60 



30 



70 80 90 100 110 120 

orf 33-1 . pep LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVIAGVI^MNTIJIIAVWLAML 
linilttllllil:!:: I I I I 1 I t I I i M I I I M I i I t t I I I I M I I I i I I 

orf33ng-l LERVRAGSFWLWVVVASMMFTAGFSGTYLIilDNQGLNFFLVIAGVLGMNTIilLAVWLATL 

70 80 90 100 110 120 



35 



40 



45 



130 140 150 160 170 180 

orf 33-1. pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
M I t I [ I I 1 I I I I I i I I I I I lllllllltlll:llllllltlllll:IIIMIil)lll 
Orf33ng-l FLRVKVGRFFSSPATWFRGKGPVNQAVLRLYADQWRQPSVRWKIGATAHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 33-1 . pep VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 
I I I I i I I I t M I I I i I I I I I t I t t I I I I I I I I I t t I I t I I I t I I I I I t I I I t I t I I I It I 
orf33ng-l VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 



50 



250 260 270 280 290 300 

orf 33-1 . pep DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 
MMIIillllti:|lll[illlllllllllMlltltllill I I I I I I I M I I I I I i I 
orf33ng-l DARAWSGLLVGSIVCYGILPRLLAWWCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 

250 260 270 280 290 300 



55 



310 320 330 340 350 360 

orf 33-1 . pep DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
I I t I I I I I I i I I I I : I I M I M I : I I I I I I I I I : I I I t I I I M I I I I I M : I I I I I I t I I 
orf 33ng- 1 DTRRETVSAVSPKI VLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 350 360 



60 



65 



370 380 390 400 410 420 

orf 33-1 . pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 
i I I M I I M I M I M I I I I I I I t I I I I I I I I I t I M t I I I I I I I I I I I 1 ) I I t I I t I I i I 
orf33ng-l TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 

370 380 390 400 410 420 

430 440 
orf 33-1 . pep RNALAECGAAWLEPDRAAQEGRLKDQX 
I I I I : i I t I I I I I t I I : I I I I I i I 1 I I 
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orf33ng-l RNALTECGAAWLEPDRVAQEGRLKDQX 

430 440 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 207>: 

1 , . CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 

51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT.GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC.GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG. -GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGTJ CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

451 GTCC. . 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

1 ..QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 
51 GSTGVSLSVF SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 
101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 
151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 



1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

401 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

451 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

701 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

751 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 



1 MMMPFIMLPW lAGVPA VPGQ NRLS RISLWG LGGVFFGVSG LVW FSLG VSL 

51 GCACFSGV SF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 
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201 SLKGLFGFFA ILIVLLG CRA 
251 GADFLGNLRL 
301 SVAGDVAGSA 



351 AWADDGDLG 
401 RADGGASDYC 
451 HAV* 



FFGGEDAHNV 
RQGGDGNIW 
RVAFGLWLA 



MPSEGGSDGI 
GYVAVGNDFD 
HAFGGLFGTC 
QIGTGGGFDT 



ADAAAKGKAE NGGNQGADGV 



AESALDWLV 
ARLCGGADAQ 
NLTDELFFAF 
QRHNVWGLR 
RFGFHRVLPF 



EGDDFLYADG 
QRGADFGCVP 
GGDLSEQQQV 
AGGSAVDGGF 
LGVSDGIALR 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of K 
meningitidis: 



orf34 .pep 
orf34a 

orf 34 .pep 
orf34a 

orf 34 .pep 
orf 34a 



10 20 30 

QKSLSR ISLWGLGGVFFGVSGLVW FSLG VSXE- 

i I I t I 



-CAC 



II IN t I I I I t I I I t I M I I I I I i t I I I III 
MMXPXIMLPWIAGVPAV PGQKRLS RXSLWGLGGXFFGVSGLVW FSL GVSXSLGVSXGCAC 
10 20 30 40 50 60 

40 50 60 70 80 90 

FSGV SFRGSGRG TFVGSTGVSLSVFSACVX GWRLPVGLSCVGRLXX LTRFFLGA 

I I I I II II tl t I II II I I I II I I t I I I I : t : : : I : : I I I I II 

FSGVSFRGSGRG TFVGSTGVSLSVFSACA PASSGCLSVXAVSAGCGLTRXFXGA 

110 



70 



80 



90 



100 



100 110 120 130 140 150 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

III IIIIIMIIIIhll I IMIINIIMIIIIIIIIIIIIIIIIII: nil 
AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



orf 34. pep S 

orf 34a PFGXNVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 

1 ATGATGATNC CGTTNATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAG AAGAGGTTGT CGAGAANTTC TTTATGGGGT TTAGGCGGCN 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTNTT 

151 TCTTTGGGTG TTTCTNTGGG CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 

201. GGGTTCGGGA CGGGGGACGT TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGCTCCG GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 

301 GTGTCGGCAG GTTGCGGTTT GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 

401 ATGAGGAGGC GTNGTNGTGT TCGGGTTGGG CGGCATCTTG TCCGACTACG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTNTGG AGGGTTTTGT CNCCGTTCGG GTNGAATGTG CTGACGATGC 

551 CTATTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTNGGTAGA GGGTGATGAC 

751 TTTTTGTACG CCGACGGTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACGTTGCC GTAGGTAACG 

851 ATTTTGACGC GCGCCTGTGT GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 

901 GACTTTGGAT GTGTTCCAAG TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TANTTGTACA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTC TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACAACGGAG ATTTGGGGCG 

1101 TGTANCCTTT GGTTTGGTTG TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 

1151 TCGATACGCA GCGCCATTAC GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 

1201 GCGGTCGACG GCGGATTTCG CGCCGACCGC CGCGCCGCCG ACGACTGCGC 

1251 TGACGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 

1301 ACGGTGTGCG GTTTGGGTTT CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 212>: 



10 



1 MMXPXIMLPW lAGVPAV PGQ 

51 SLGVSXGCAC FSGV SFRGSG 

101 VSAGCGLTRX FXGAAGDGSP 

151 PFGSQNSVSR GLSVCCGSVW 

201 IRSL GVSLKG LFXFFAILIV 

251 FLYADGGADF LGNLRLFFGG 

301 DFGCVPSVAG DVAGSARQGG 

351 SEQQQVAWA DNGDLG RVXF 

401 AVDGGFRADR RAADDCADAA 

451 DGIALRHAV* 



KRLS RXSLWG LGGXFFGVSG 
RG TFVGSTGV SLSVFSACA P 
LPLSSVPSGC AGADEEAXXC 
RVLSPFGXNV LTMPIANAPM 
LLGCRAMPSE GGSDGIAESA 
EDAHNVGYVA VGNDFDARLC 
DGNVXVHAFG GLFGTCNLTD 
GLWLAQIGA GGGF DTQRHY 
AEGKAEDGGS QGADGVRFGF 



LVWFSLGVSX 
ASSGCLSVXA 
SGWAASCPTT 
AVIQMSNTAR 
LDWXVEGDD 
GGADAQQRGA 
ELFLAFGGDL 
VWGXRAGGS 
HRVLPFLGVS 



ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



orf 34a .pep 
orf34-l 

orf34a.pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf 34a. pep 
orf34-l 

orf 34a. pep 
orf34-l 



10 20 30 40 50 60 

MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 

It I IIMIIIIIIIItll:llll ItlllM I t I I I I I n I I 1 I I I illl 

MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 

10 20 30 40 50 

70 BO 90 100 110 120 

FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRXFXGAAGDGSP 
I M i I I M M I i I I I I t M I t I t I I I I I : I M I I I I I I I I I I I I It t I I I I II t I II t 
FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
60 70 80 90 100 110 

130 140 150 160 170 180 

LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 

I II i It t I I 1 t I : II I I I I I t I t I t I I II t I I I I I t I I I t I I I I I : I I I I t I t [I 

LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
120 130 140 150 160 170 

190 200 210 220 230 240 

LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 

I I I t I I I I I II : t t I I M I I I i I I I M I I I II I I I I I I t t I I t I t t I I I t I I I It I tl t 
LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
180 190 200 210 220 230 

250 260 270 280 290 300 

LDWXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
Mil llllllllllttlltlltitlllltltllllllMllltltlllllltlllltll 
LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
240 250 260 270 280 290 

310 320 330 340 350 360 

DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 

Mtlltllitinttlilltlll: I II I I I I t I I I I I I I I I 1 : I I I M I I I I I i I I 1 I I 

DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
300 310 320 330 340 350 

370 380 390 400 410 420 

DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 

t : I I I I t I I I I 1 I I I II 1 : I It 1 t 1 t II till I 1 I 1 1 1 I I 1 t I t t 1 I : t II I t I 
DDGDLGRVAFGLWLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 
360 370 380 390 400 410 

430 440 450 460 

AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
|:tllt:|l:IIIIMIItllllltlltllltlllllttt 
AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
420 430 440 450 



Homology with a predicted ORF from N.2onorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from K 



gonorrhoeae: 



orf 34 .pep 



QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE CAC 35 
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orf34ng 
orf34 .pep 
orf 34ng 
orf34 .pep 
orf34ng 
orf 3 4 .pep 
orf34ng 



II I II i I I I I I : li I I I I I I I M I I I I II III 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 



60 



90 



FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 

tllllltltl |:|IIIIillllllllll :|l: I : II IIIIIIM 

FSGVSFRGSGWGAFVGSTGVSLSVFSACVP VPVNESAARAASEGR— GLTRFFLGA 114 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 
Ml lllllllllllllllllltlllltlllll:llllllllllllllllll: nil 
AGDGSPLPLSSVPSGCAGSDEMWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 174 

S 175 

PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 



The complete length ORF34ng nucleotide sequence <SEQ ID 213> is; 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGATGATGC 
GCCGGGTCAA 
TGTTTTTCGG 
TCTTTGGGTG 
GGGTTCGGGA 
TGTTTTCAGC 
GCATCCGAAG 
CGGCAGTCCG 
ATGAGGCGGC 
CCGTTTGGCA 
TTCGGTTTGG 
CTACTGCCAA 
ATCAGGAGTT 
TTTGATTGTG 
ACGGCATTGC 
TTTTTGTACG 
CTTCGGGGGT 
ATTTTGACGC 
GACTTTGGAC 
GCAGGGAGGC 
GAACGTGCAA 
TCCGAGCAGC 
TGTAGCCTTT 
TCGATACGCA 
gCGGTCGATG 
TGAAGCAGCC 
ACGGTGTGTG 
GACGGCATTG 



CGTTCATAAT 
AAGAGGTTGT 
GGTGTCCGGT 
TTTCTTTGGG 
TGGGGGGCGT 
TTGTGTTCCG 
GGCGCGGTTT 
CTGCCGCTTT 
GTGGTGGTGT 
GCCAGAATTC 
AGGGTTTTGT 
TGCGCCGATG 
TGGGGGTCAG 
CTTTTGGGGT 
CGAGTCAGCG 
CCGAcggTGG 
GAGGATGCCC 
GCGCCTGTGT 
GTGTTCCAAG 
GACGGTAATG 
TCTGACCGAC 
AGCAGGTGGC 
GGTTTGGTTG 
ACGCCATAAC 
ACGGATTTTG 
GCCGAGGGCA 
GTTTGGGTTT 
CTTTGCGCCA 



GCTTCCTTGG 
CGAGAATCTC 
TTGGTATGGT 
CTGCGCCTGT 
TTGTGGGCAG 
GTGCCGGTTA 
gACCCGGTTT 
CTTCTGTGCC 
TCGGGTTGGG 
GGTTTCGCGG 
CGCCGTTCGG 
GCGGTGATAC 
CCTGAAGGGT 
GTCGGGCAAT 
TTGGACGTAG 
TGCTGACTTT 
ATAACGTAGG 
AGCGGGGCTG 
TGTCGCCGGC 
TAGTTGTATA 
GAACTGTTTT 
GGTTGTAGCC 
TTTTGGCGCA 
GTtgtCATCG 
CGCCGACGGC 
AGGCTGAGGA 
CATCGGGGAC 
TGCCGTCTGA 



ATTGCGGGTG 
TTTATGGGGT 
TTTCTTTGGG 
TTTTCGGGTG 
TACGGGGGTT 
ACGAATCGGC 
TTCTTGGGTG 
GTCCGGCTGT 
CGGCATCTTG 
GGGCTGTCGG 
GTTGAATGTG 
AGATGAGCAA 
TTGTTCGGTT 
GCCGTCTGAA 
TTTTGGTAGA 
TTGGGTAATC 
TTACATTGCC 
ATGCCCAGCA 
GATGTCGCCC 
CGCCTTCGGC 
TCGCCTTCGG 
GACGACGGAG 
GGTAGGAACG 
GTTtgcgcgc 
GGCCCCGCCG 
CGGCGGCAAT 
TTCCTTTCTT 



TGCCTGCCGT 
TTGGCCGGCG 
CGTTTCTTTT 
TTTCTTTTCG 
TCTTTGAGTG 
TGCCCGGGCC 
CGGCAGGGGA 
GCGGGTTCGG 
TCCGACGGCG 
TGTGTTGCGG 
CTGACGATGC 
TACGGCGCGT 
TTTTTGCCAT 
GGCGGTTCAG 
GGGTAATGAC 
TGCGCCTGTT 
GTAGGTAATG 
GcgtgGCGCG 
GCAGTGCGCG 
GGCCTGTTCG 
TGGCGACTTG 
ATTTGGGGCG 
GGCGGTGGTT 
CGGTGGTTcg 
ACGACTGCGC 
CAGGGTGCGG 
GGGCGTTTCA 



This encodes a protein having amino acid sequence <SEQ ID 214>: 



1 MMMPFIMLPW lAGVPA VPGQ KRLSR ISLWG LAGVFFGVSG LVW FSL GVSF 

51 SLGVSLGCAC FSGV SFRGSG WG AFVGSTGV SLSVFSACV P VPVNESAARA 

101 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

201 IRSLG VSLKG LFGFFAILIV LL GCRAMPSE GGSDGIAESA LDWLVEGND 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNVWYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQQVAWA DDGDLGR VAF GLWLAQVGT GGGF DTQRHN WIGLRAGGS 

401 AVDDGFCADG GPADDCAEAA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 



10 20 30 40 4 50 

orf 34-1 . pep blMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFPGVSGLVWFSLGVS LGCAC 

lllllllt llllll|:|Mlttltll:lllllltllllllllM lltll 

orf34ng MMMPFIMLPW lAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 

10 20 30 40 50 60 



60 70 80 90 100 110 

orf 34-1 . pep FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
I I I I I I I t I I I : II I II I I I I I t I I I I I I : : : : i : I I I I I I M I I I I I I I I I 
orf 34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
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70 



80 



90 



100 



110 



120 



10 



15 



20 



25 



30 



35 



40 



120 130 140 150 160 170 

orf 34-1 . pep LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
llllllltlllllllltlllltlltMttrllllllMlillilllll: llilllllll 
orf34ng LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 

130 140 150 160 170 180 

180 190 200 210 220 230 

orf 34- 1 . pep LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILXVLLGCRAMPSEGGSDGIAESA 
nil lllll):|IMIItllllltllllltMIIIIIMIIilitllllilllilini 
orf34ng LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 3 4 - 1 . pep LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
|f||||||:tNilltli[IIIIIIIIIMttlllltl:|IIIIIIIIM:ttlllllll 
orf34ng LDWLVEGNDFLYADGGADFLGNLRLFFGGEDAHNVGYIAVGNDFDARLCSGADAQQRGA 

250 260 270 280 290 300 

300 310 320 330 340 350 

orf 34-1 . pep DFGCVPSVAGDVAGSARQGGDGNI\AmAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
III lllllllll llllltll):||:itlMllllllllllttlMtlllllMIIMI 
or f 3 4ng DFGRVPSVAGDVARSARQGGDGNVWYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

310 320 330 340 350 360 

360 370 380 390 400 410 

orf 34-1 . pep DDGDLGRVAFGLWLAQIGTGGGFDTQRHNVWGLRAGGSAVDGGFRADGGASDYCADAA 
llllllllllllllMI:tlllllllllllll:ll)llllltl II MM :l Ihil 
orf34ng DDGDLGRVAFGLWLAQVGTGGGFDTQRHNWIGLRAGGSAVDDGFCADGGPADDCAEAA 

370 380 390 400 410 420 

420 430 440 450 

orf 34-1 . pep AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
|:MII:ttlllllll tllll I II I 1 I II I I I I I I I M 
orf34ng AEGKAEDGGNQGADGVWFGFHRGLPFLGVSDGIALRHAVX 

430 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and Kgonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 26 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 215>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

45 51 CGCCGCCTGC GGATT.CAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGJAAAAAA GAAATCGTCT TCGGCACGAC 

151 CGTCGGCGAC TTCGGCGATA TGGTCAAAGA ACAAATCCAA GCCGAGCTGG 

201 AGAAAAAAGG CTACACCGTC AAACTGGTCG AGTTTACCGA CTATGTACGC 

251 CCGAATCTGG CATTGGCTGA GGGCGAGTTG 

50 This corresponds to the amino acid sequence <SEQ ID 216; ORF4>: 

1 MKTFFKTLSA AALALILAAC G.QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Fxuther sequence analysis revealed the complete nucleotide sequence <SEQ ID 21 7>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

55 51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGA7VAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 
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251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; 0RF4-1>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTE7VLFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of K 
meningitidis: 

10 20 30 40 50 59 

orf 4 . pep MKTFFKTLSAAAIALILAA CG-QKPSAPAASASAAADNGAT^KKEIVFGTTVGDFGDMVKE 
lllllllltllMMtltlll IIMIilllllllMIMI llllllllilllllllll 
or f 4 a MKTFFKTLSAAALALILAA CGGQKDSAPAASASAAADNGAAXKEIVFGTTVGDFGDMVKE 

10 20 30 40 50 60 

60 70 80 90 

orf 4 . pep QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 
II lltllllinilt lllil llllllllt 
orf 4a XIQPELEKKGYTVKLVEXTDYVRXNLALAEGELDINVXQHXXYLDDXKKXHNLDITXVXQ 
70 80 90 100 110 120 

orf 4a VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length 0RF4a nucleotide sequence <SEQ ID 219> is: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 

501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 

551 NNNNANNNNT NNNNNNNNNN NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 

601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT. 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATG7UVG 

851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ED 220>: 



1 MKTFFKTLSA AALALILAAC GGQKDSAPAA SASAAADNGA AXKEIVFGTT 
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51 VGDFGDMVKE XIQPELEKKG 

101 XXYLDDXKKX HNLDITXVXQ 

151 PXXFXRVLVM LDELGXIKLK 

201 XXXXAXXXXX XXXXXXXXXS 

251 WLKDVTEAYN SDAFKAYAHK 

A leader peptide is underlined. 



YTVKLVEXTD YVRXNLALAE GELDINVXQH 
VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 
DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 
GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 
RFEGYKSPAA WNEGAAK* 



Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ID 221>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC* 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 

551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CAT/VAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 222; 0RF4a-l>: 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDEX5DMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 



10 20 30 40 50 60 

orf4a-l MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
I I I I I t 1 I I I i I t I I I I I M ) I I I I I I I 1 I t i I I I I I I I H I t I t I I I I f I I I I I It I I t 
orf4-l MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf4a-l QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
Mi IMIIIMillltlllMIIIIMIIIIIIIMMIIIIirillMIIMIMlM 
orf4-l QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
I I I M I I I I I I I M I I I I I t M I I I I I I I M I [ I I I I i I I I I I I I I t I I I I I I I I t I t I I 
orf4-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf4a-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
I I I I [ I I I I I I t 1 I I I I I i I t i I I i I I I I M I I i I I t t I I t I i I I I I I I I I t I I I I I I I I 
orf4-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

190 200 210 220 230 240 

250 260 270 280 

orf4a-l AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 
I I i I I I I t I I I I I I I I i I I I I I I I I I t I I I I I I I I I I I I I I I i I t I I I 
orf 4 - 1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 

250 260 270 280 
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Homology with an outer membrane protein of Pasteurella haemolitica (accession q08869). 
0RF4 and this outer membrane protein show 33% aa identity in 91aa overlap: 



10 20 

lip2 . pasha MNFKKLLGVALVSALALTACKDEKAQAP^ 

II I ::|| M l:|| :|: I 
0RF4 VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL—ALILAACGFKKTARPPHPL 
110 120 130 140 150 



30 40 50 60 70 80 

10 1 ip2 . pasha - ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTE YTQPNAALHSKD 

: :: I : |: :| ::|:: :: III I : I I : I I : 1 : : I I !1 : 
0RF4 LPPPTTARRKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGE 
160 170 180 190 200 210 

15 90 100 110 120 130 140 

lip2. pasha LDANAFQTVPYLEQEVKDRGYKLAIIGNTLVWPIAAYSKKIKNISELKDGATVAIPNNAS 
I 

0RF4 L 



20 Homology with a predicted ORF from N.£onorrhoeae 

0RF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (0RF4.ng) from A^. 



gonorrhoeae: 



25 



10 20 30 

orf4nm.pep MKTFFKTLSAAALALILAACGXQKDSAPAA 

M I I M I II: I: I I I I II II I llllllll 
orf4ng RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA 
200 210 220 230 240 250 



40 50 60 70 80 89 

30 orf4nm.pep SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 

11:1 : I I I I M I I II II I t I I n I II II II I I I t I I I I t I I I i I II I I I I I I II I I 1 ! I 
orf4ng SAAAPSADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 
260 270 280 290 300 310 



35 90 

orf4nin.pep EGEL 
11 II 

orf4ng EGELDINVFQHKPYLDDFECKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 
320 330 340 350 360 370 

40 The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 



protein having amino acid sequence <SEQ ID 224>: 



1 MKTFFKTLST ASLALILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

45 151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

50 51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCG/\AAA AAGAAAtcgt ctTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

55 301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

401 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

451 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

60 551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 
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.601 CTGCCGCGCA 

651 CGCCATAAGC 

701 GCTTTGCCTA 

751 CAATGGCTTA 

801 CTACGCGCAC 

851 AAGGCGCAGC 



GCCGCGCCGA 
AGCGGCATGA 
TGTCAACTGG 
AAGACGTAAC 
AAACGCTTCG 
CAAATAA 



CGTGGATTTT 
AGCTGACCGA 
TCTGCCgtcA 
CGAGGCCTAT 
AGGGCTACAA 



GCCGTCGTCA 
AGCCCTGTTC 
AAACCGCCGA 
AACTCCGACG 
ATACCCTGCC 



ACGGCAACTA 
CAAGAGCCGA 
CAAAGACAGC 
CGTTCAAAGC 
GCATGGAATG 



This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>; 



1 MKTFFKTLSA AALALILAAC 



10 



51 TVGDFGDMVK 

101 HKPYLDDFKK 

151 DPSNFARALV 

201 LPRSRADVDF 

251 QWLKDVTEAY 



EQIQAELEKK 
EHNLDITEAF 
MLNELGWIKL 
AWNGNYAIS 
NSDAFKAYAH 



GGQKDSAPAA 
GYTVKLVEFT 
QVPTAPLGLY 
KDGINPLTAS 
SGMKLTEALF 
KRFEGYKYPA 



SAAAPSADNG 
DYVRPNLALA 
PGKLKSLEEV 
KADIAENLKN 
QEPSFAYVNW 
AWNEGAAK* 



AAKKEIVFGT 
EGELDINVFQ 
KDGSTVSAPN 
IKIVELEAAQ 
SAVKTADKDS 



This shows 97.6% identity in 288 aa overlap with 0RF4-1: 



15 



20 



25 



10 20 30 40 50 59 

orf 4-1 . pep MKTFFKTLSAAALALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
I I t I I I I I M I I f I I I I I I i I t I I I I t M I I I : t : I M I I I I I I I t I I 1 I i I I I i I I t I 
or f 4ng- 1 MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 4-1. pep EQIQAEI^KKGYTVKLVEFTDYVRPNIAIAEGELDINVFQHKPYLDDFKKEHNLDITEVF 
IIIIIIIIIIMIIIIIMIIIIiilllllllllllMllliltlllllMltlltll:! 
orf4ng-l EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

70 80 90 100 110 120 



30 



120 130 140 150 160 170 179 

orf 4-1. pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 
I I I I I I I t I I I I I I I t t I I I I I I I I I I I I I M I I I I I : I It I : I I I M I I I I t I i I I I I I 
orf4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 



35 



180 190 200 210 220 230 239 

orf 4-1. pep KADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNW 
I I i I I I I t I I M I I I I t I I I I I M I I M I I i I I t I I t I I I t I t I I I M I I I I I I I I I I I I 
orf4ng-l KADIAENLKN IK I VELEAAQLPRSRADVD FA WNGNYAISSGMKLTEALFQE PS FAYVNW 

190 200 210 220 230 240 



40 



240 250 260 270 280 

orf 4-1 . pep SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 
I n M I I I I t M I t I i I t I i { [ I I I M I I I I I I I I M M I I I It t I M 
orf4ng-l SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 



45 In addition, 0RF4ng-l shows significant homology with an outer membrane protein bom the 
database: 

ID LIP2_PASHA STANDARD; PRT; 276 AA. 

AC Q08869; 

DT Ol-NOV-1995 (REL. 32, CREATED) 
50 DT Ol-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT Ol-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 
DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 
SCORES Initl: 27 9 Initn: 416 Opt: 494 

Smith-Waterman score: 4 94; 36.0% identity in 275 aa overlap 

10 20 30 40 50 

orf 4ng-l . pep MKTFFKTLSAAAL— ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 
I I I : : I I I I i : I I : [ : t I I : : I : : : I I I I : : I : : I 

lip2_pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

60 10 20 30 40 50 

60 70 80 90 100 110 

orf 4ng-l . pep VKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
:: :: III I : M : I I : I : : I I II :ll |:lt III:: j::: :: 
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lip2 pasha TEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKDLDANAFQTVPYLEQEVKDRGYKLAI 
~ 60 70 80 90 100 110 

120 130 140 150 160 170 

5 or f 4ng-l . pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

:: : |:: I |:|:: |:||t:||: II: II llll::|: I :llll I : 
lip2 pasha IGNTLVWPIAAYSKKIKNISELKDGATVAIPNNASNTARALLLLQAHGLLKLKDPKN-VF 
120 130 140 150 160 170 

10 ' 180 190 200 210 220 230 

orf 4ng-l . pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTE— ALFQEPSFA 

I : : II tl II I I II : : : : I I I I : : II : I : : t 1 : : I : : : : : : 
lip2 pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 
180 190 200 210 220 230 

15 

240 250 260 270 280 289 

orf 4ng-l . pep YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

Ml : : :ll: |: ::::::: I I |:| 

lip2 pasha YVNLWSREDNKDDPRLQTFVKSFQTEEVFQEALKLFNGGWKGW 
20 ~ 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteurella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from Kmeningitidis and 
25 Kgonorrhoeae^ and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

0RF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in KcolU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8A and 
8B show, repsectively, the results of affinity purification of the His-fiision and GST-fusion 
30 proteins. Purified His-fiision protein was used to immunise mice, whose sera were used for ELISA 
(positive result). Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that 0RF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophiUcity, antigenic index, and AMPHI regions for 0RF4-1. 
35 Example 27 

The following partial DNA sequence was identified m Kmeningitidis <SEQ ED 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

40 151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 GCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

45 401 AGCAA/WVTGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

451 CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 

551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 
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651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC. , . . 

701 GC AGACACGCCC GCCGCATCCG 

751 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 

5 851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 

901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; 0RF8>: 

1 PRRP RHAPVSRC3)L LQGGGTYARH GHRAGRGFGR FM7VEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

10 101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

1 5 Computer analysis of this amino acid sequence gave the following results: 
Sequence motifs 

0RF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 

20 Homology with a predicted ORF from N.2onorrhoeae 

0RF8 shows 86,5% identity over a 312aa overlap with a predicted ORF (0RF8.ng) from N. 
gonorrhoeae: 



25 



30 



orf 8ng 


1 


orf 8 .pep 


1 


orf8ng 


51 


orf 8 .pep 


45 


orf 8ng 


101 


orf 8 .pep 


95 


orf 8ng 


151 


orf 8. pep 


145 


orf8ng 


201 


orf 8. pep 


195 


orf 8ng 


251 


orf 8 .pep 


245 


orf 8ng 


301 


orf 8 .pep 


295 



MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGRFMAEPALFPR 50 
tlMIIII I III! llilll:IIMMIIIIIIilllltll 



I I I I I I I I I I I I I I I I I I I I I I I I t I I i I 1 : M I 1 n I I M I I I 



II I i I I i I III I I I I I I I I I I I I I I I I I I I M I I I I M t i M I I I 



35 orf8ng 151 AYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 200 

1:1 II t : I : I I I I I t n I I II I I I I I I I II 1)1 II I M I I I i I 
AHDPRTPRGEHGENAPNQRTHGQKPQPSRRHIGRKLHQPRHDGSHAARPP 194 

QNRQHHRAAPDHRRQAAISQTQRQRNPAARPPLHTAPNRPATNRRPHQRQ 250 
40 ' II II I II I II II I II I I I II I I t I I I II I I I I M I I I 

XNRQHHRAAPDHRRQAAISQTQRQRNPAAXPPLHTAPN Q 244 

TRPPHPHRHRHQPRTGSPRRTPPLPMAGFPLAQHQYASGNFRPRHPPATH 300 
I ! I M I II M I I I t I I I I It I I I II I I I I I I I I . II I I I I II II t III 
45 orf 8. pep 245 TRPPHPHRHRHQPRTGSPRRTPPLPMAGLPLAQHRYASGNFRPRHPAATH 294 

PPQMAGCPRTPTPAPKPA* 319 
I II I I i I I I M I t I t I I I I 
PPQMAGCPRTPTPAPKPA* 313 

50 The complete length 0RF8ng nucleotide sequence <SEQ ID 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 23 0>: 

1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

55 151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 
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201 QNRQHHEIAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 
251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 
301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from Kmeningitidis 
and N.gonorrhoeae, and their epitopes, could be usefiil antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 231>: 

1 ..GAAATCAGCC TGCGGTCCGA CNACAGGCCG GTTTCCGTGN CGAAGCGGCG 
51 GGATTCGGAA CGTTTTCTGC TGTTGGACGG CGGCAACAGC CGGCTCAAGT 
101 GGGCGTGGGT GGAAAACGGC ACGTTCGCAA CCGTCGGTAG CGCGCCGTAC 
151 CGCGATTTGT CGCCTTTGGG CGCGGAGTGG GCGGAAAAGG CGGATGGAAA 
201 TGTCCGCATC GTCGGTTGCG CTGTGTGCGG AGAATTCAAA AAGGCAC7VAG 
251 TGCAGGAACA GCTCGCCCGA AAAATCGAGT GGCTGCCGTC TTCCGCACAG 
301 GCTTT.GGCA TACGCAACCA CTACCGCCAC CCCGAAGAAC ACGGTTCCGA 
351 CCGCTGGTTC 7\ACGCCTTGG GCAGCCGCCG CTTCAGCCGC AACGCCTGCG 
401 TCGTCGTCAG TTGCGGCACG GCGGTAACGG TTGACGCGCT CACCGATGAC 
451 GGACATTATC TCGGAGA.GG AACCATCATG CCCGGTTTCC ACCTGATGAA 
501 AGAATCGCTC GCCGTCCGAA CCGCCAACCT CAACCGGCAC GCCGGTAAGC 
551 GTTATCCTTT CCCGACCGG. . 

This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 ..EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 
51 RDLSPLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 
101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVWSCGT AVTVDALTDD 
151 GHYLGXGTIM PGFHLMKESL AVRTANLNRH AGKRYPFPT . , 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 

1 ATGACGGTTT TGAAGCTTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGTC GGCGCGCCTT 

501 GTCGCGTTTA GGTTTGGATG TGCAGATTAA GTGGCCCAAT GATTTGGTTG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTTG TCCTGCCCAA 

651 GGAAGTAGAA AATGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTGCTGC TGGAAACGCT GTTGGTGGAA 

751 CTGGACGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTT TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCTGTGT GCGGAGAATT CAAAAAGGCA 

1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 
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1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AflAATACCGT 
1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 
1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence <SEQ ID 234; 0RF61-1>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



MTVLKLSHWR 
LLRQHDGYWR 
ARIAPDKAHK 
EliGSLSPVAA 
GKTVAWGIG 
LDAVLLQYAR 
QGVLHLETAE 
KWAWVENGTF 
QVQEQLARKI 
CWVSCGTAV 



VLAELADGLP 
LVRPLAVFDA 
TICVTHLQSK 
VACRRALSRL 
INFVLPKEVE 
DGFAPFVAEY 
GKQTWSGEI 
ATVGSAPYRD 
EWLPSSAQAL 
TVDALTDDGH 



RYPFPTTTGN 
AKVAEALPPA 



AVASGMMDAV 
FLAENTVRVA 



QHVSQLARMA 
EGLRELGERS 
GRGRQGRKWS 
GLDVQIKWPN 
NAASVQSLFQ 
QAANRDHGKA 
SLRSDDRPVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGSVMMMHGR 
DNLVIYGLLN 



DMKPQQLNGF 
GFQTALKHEC 
HRLGECLMFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRRDSERF 
KADGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKTGAGKP 
MIAAEGREYE 



WQ®4PAHIRG 
ASSNDEILEL 
FGWVFDRPQY 
GILIETVRTG 
AVLLETLLVE 
FEGTVKGVDG 
LLLDGGNSRL 
CAVCGEFKKA 
LGSRRFSRNA 
TANLNRHAGK 
VDVIITGGGA 
HI* 



orfGl 


23 


baf 


3 


orf61 


78 


baf 


63 


orfGl 


132 


baf 


123 



Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for 0RF61-L Further 
computer analysis of this amino acid sequence gave the following results: 

Homology with the baf protein of B, perttissis (accession number U12020y 
0RF61 and baf protein show 33% aa identity in 166aa overlap: 

LLLDGGNSRLKWAWVE-NGTFATVGSAPYR DLSPLGAEWAEKADGNVRIVGCAVCG 77 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 

EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 131 

+ + L I WL + A G+RN YR+P++ G+DRW L + 

lARGEAIAATIJ^AGGCDIRWIJUiQPUmGLRNGYRNPDQLGADRWACMVGVLARQPSVHP 122 

ACVVVSCGTAVTVDALTDDGHYLGXGTIMPGFHLMKESLAVRTANL 177 

+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 
PLLVASFGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 

Homology with a predicted QRF from N.meninsitidis (strain A) 

ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of AT. 
meningitidis: 

10 20 30 

orf 61 .pep EISLRSDXRPVSVXKRRDSERFLLLDGGNS 

I I I ! I t I Mill M I I I I I M 1) I i I M 
orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 
290 300 310 320 330 340 

40 50 60 70 80 90 

or f 61 . pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 
i I I I 1 M I I I I I I i I I I I 1 I I M I t I I I I t I I I : I I I I I I I I I I t t M I I I M M I I I t I 
o r f 6 1 a RLKWAWVENGT FATVGS APYRDLS PLGAEW AEKVDGN VRI VGCAVCGE FKKAQVQEQLAR 

350 360 370 380 390 400 

100 110 120 130 140 150 

orf 61 . pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALT DD 
I I I I M I I I I I I It I I I I I I I I I i I t I t I i f I I M I 1 I I I I I I t I I M I I I I M I I I I I 
orf 61a KIEWLPSSAQALGIRNHYEIHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALT DD 
410 420 430 440 450 460 

160 170 180 189 

or f 61 . pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 

1 1 1 M 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 i 1 1 1 i I M i 1 1 1 1 1 1 

orf 61a GHYLG-GTIMPGFHLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMM 
470 480 490 500 510 520 



orf 61a 



HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
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530 540 550 560 570 

The complete length 0RF61a nucleotide sequence <SEQ ID 235> is: 



580 



1 


ATGACGGTTT 


51 


CGGTTTGCCG 


101 


CGCAGCAGCT 


151 


CTGTTGCGCC 


201 


TTTCGATGCC 


251 


CGGCATTGAA 


301 


GCGCGGATTG 


351 


GCAAAGTAAG 


401 


GCGAGTGTCT 


451 


GAGTTGGGTT 


501 


GTCGCGTTTG 


551 


TCGGACGCGA 


601 


GGCAAAACGG 


651 


GGAAGTGGAA 


701 


GGCGGGGAAA 


751 


CTTGATGCGG 


801 


GGCGGAATAT 


851 


TGCGCGACGG 


901 


CAAGGCGTTC 


951 


CGGCGAAATC 


1001 


GGCGGGATTC 


1051 


AAGTGGGCGT 


1101 


GTACCGCGAT 


1151 


GAAATGTCCG 


1201 


CAAGTGCAGG 


1251 


ACAGGCTTTG 


1301 


CCGACCGCTG 


1351 


TGCGTCGTCG 


1401 


TGACGGACAT 


1451 


AAGAATCGCT 


1501 


CGTTATCCTT 


1551 


GGATGCGGTT 


1601 


AAACCGGGGC 


1651 


GCAAAAGTTG 


1701 


GCGCGTGGCG 


1751 


CCGAAGGCGG 



TGAAGCCTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGAAAA 
CAAATTGGGC 
TTGCCGTGGT 
AACGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 
CCGAAGCCCT 
GACAACCTCG 
GGAATCGGAA 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
CGCAAATCAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGTTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCCGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 
GCCGCCTGCA 
TCATTCACGG 
CATACTTAA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGTG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGCC 
GTGGCCAAAC 
TTGAAACGGT 
ATCAATTTCG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 
TTTTTGGCGG 
GCTGCTGAAC 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTCG 
CAGGACGGGC 
TGCTGCCCAA 
ACGGCATCGC 
GTTGGCGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGTGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
AAAATACCGT 
CTGATTGCCG 



This encodes a protein having amino acid sequence <SEQ ID 236>: 



1 MTVLKPSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLKTQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRIX5ETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KVDGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HT* 

0RF61a and 0RF61-1 show 98.5% identity in 591 aa overlap: 



10 20 30 40 50 60 

or f 61a . pep MTVIJCPSHWRVIAEIADGLPQHVSQIARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
M It I I I I t I I I It t I I I I I I I I I I M t I t tl I t I M I I t I I I I I M t M I I t I M I I t 
O r f 6 1 - 1 MT VLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNG FWQQMPAHIRGLLRQHDG YWR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 61a . pep LVRPIAVFDAEGUIELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 
I I I I M I t I I t I t I I I I I I tl I I t I I I I M I I I I I I I I I I t M I I I t I t I t I I 1 I I i I I I 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 

70 80 90 100 110 120 



130 140 150 160 170 180 
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10 



15 



20 



25 



30 



35 



40 



45 



or f 61a . pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 
I I I I I I I I M I I I I I I I I t I I M I I I i ) I 1 I I I I M M M [ I I I I I t I I I t I : I I M I I 
O r f 6 1 - 1 GRGRQGRKWS HRLGECLMFS FGWVFDRPQYELGS LS PVAAVACRRALSRI>GLDVQIKWPN 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 61a . pep DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 
M I I ) I I I I I I i I I I I I I i I t I I i t I 1 I i t i I I 1 I I I i I I I I t I I i t I I I I I I I I I t I I I 
orf61-l DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 61a . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I I M I I I I : t I I I I I I I I I I I I I M t I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I t I I 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
I I I I f I I I J I I I I I I I I I I I I i I I M t I I I t I i I M I t I I I t I I M I M I I I I I I I I I I I 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 61a . pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
I i I I I I I t I I I I t I i I I I M I: I I I I ] t I I I I I I M I I I i i I i I t I M I I I M t I I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 61a . pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
I I I I I I I I I I I I I I I I M I I I I I M I I M I I M I M I t I I I I ! I I 1 I I I I I I I I t I I I i t 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 61a . pep HLMKESIAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 
I t t i I I I I M i I I I I I I I I t f I It M M I I I I I I I n t I I I I I I I I I I I I M 11 I I I I I I 
orf 61-1 HI^IKESIAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

or f 61a . pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

I I I I I i I i I I t i I I I I I i I I I I I I t I I I I I I I I I I : M I I : I I I I I I it 
or f 6 1 - 1 VDVI ITGGGAAKVAEALPPAFLAENTVRVADNLVI YGLLNMI AAEGREYEHIX 

550 560 570 580 590 



50 



Homology with a predicted ORF from K^onorrhoeae 

0RF61 shows 94,2% identity over a 189aa overlap with a predicted ORF (0RF61.ng) from N, 
gonorrhoeae: 



55 



60 



65 



orf 61 .pep 
orf 61ng 
orf 61 .pep 
orf 61ng 
orf 61 .pep 
orf 61ng 
orf 61 .pep 
orf 61ng 



E I S LRS DXRPVS VXKRRDSERFLLLDGGNS 3 0 
I i I I t I I III t I 11111111:1111 

TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 

RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 90 
llllllllllltllllllllllllllllillllttlllllllilllll llllhlllll 

RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 
I I I I II t II II I I I II I I I I II I I I I II I II I I I II II I II II M I ! t I I t I II t I M I 

KIEWLPSSAQ7UX5IRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 189 
I I I t I I t I I I t II I II I II I I I I M I II I I I I t I I I I 

GHYLG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 390 



wo 99/24578 



-175- 



PCr/IB98/01665 



An 0RF61ng nucleotide sequence <SEQ ID 237> was predicted to encode a protein having amino 
acid sequence <SEQ ID 238>: 

1 WrSFGWAFDR PQYEL GSLSP VAAIAC RRAL GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAW GIGINFVLPK EVENAASVQS LFQTASRRGN 

101 ADAAVLLETL LAELGAVLEQ YAEEGFAPFL NEYETANRDH GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTWS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGES KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSE^FS RNACWVSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

351 AVRTANLNRP AGKRYPFPTT TGNAVASGMM DAVCGSIMMM HGRLKEKNGA 

401 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 

Further analysis revealed the complete gonococcal DNA sequence <SEQ ED 239> to be: 

1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

751 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

801 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGcctGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

1051 AAGTGGGCGT GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 gtaCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

1201 CAAGTGAAGG AACAGCTCGC CCGAAAAATC 6AGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

1751 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence <SEQ ID 240; 0RF61ng-l>: 

1 MTVLKPSHWR VLAELADGLP QHVSQLAREA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRDLGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLAfFS FGWAFDRPQY 

151 ELGSLSPVAA LACRRALGCL GLETQIKWPN DLWGRDKLG GILIETVRAG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LGAVLEQYAE EGFAPFLNEY ETANRDHGKA VLLLRDGETV CEGTVKGVDG 

301 RGVLHLETAE GEQTWSGEI SLRPDNRSVS VPKRPDSERF LLLEGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGESKKA 

401 QVKEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRPAGK 

501 RYPFPTTTGN AVASGMMDAV CGSIMMMHGR LKEKNGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HA* 



0RF61ng-l and 0RF61-1 show 93.9% identity in 591 aa overlap: 
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10 



15 



20 



25 



30 



35 



40 



orf 61ng-l .pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 

orf 61ng-l.pep 

orf61-l 

orf 61ng-l .pep 

orf61-l 



MTVLKPSHWRVlJ^lADGLPQHVSQIJy^DMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 
Mill M I t I I I I I I I II I II I I I I I t I I I I I I I I I I I II t I I I I M II I I I I I M I t 
MTVIJCLSHVmVIJ^IJUDGLPQHVSQIJU^DMKPQQI^GEWQQMPAHIRGLLRQHDG™^ 60 

LVRPLAVFDAEGLRDLGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 
f I II I I II I I II II : I I II I I M I I II I I I I I I II I I I II II I t I t II t I I I i I I I II I i 
LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 
I II II I II I I I I t I II I II I I II : t I I II I i t I II I I II I : II II t t : I I I I I I I II 
GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 180 

DLWGRDECLGGILIETVRAGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 
I II I I I I II I I I I I I II I : I II II II I I I II I I I I I I I 11 I I I I II I I I II I I II I II I I 
DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 
I I I I I I I I : II I I I i I t : : I I I II : I t : : 1 M II i t I II ) I I I I i I I I I I I II I i t 
AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

RGVLHLETAEGEQT WS GE I SLRPDNRS VS VPKRPDSERFLLLEGGNSRLKWAWVENGT F 3 6 0 
: I II I I I I I II : I II I I II I I I t 1:1 I I I II i It i II I I I : I M II I I I i II I II I I 
QG\nJlLETAEGKQTVVSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 3 60 

ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPSSAQAL 420 

I I I II I I I I t I I II i t I i I I I I II t I I I I I II M II I II II : I I I I I I I II i I t I II II 
ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 420 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 480 
I I I I I M II t I I II I I II I I t I II I I I I I II I I I II I I I I I I I I I I I I I I I i I I I I I I I I 
GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 480 



orf 61ng-l .pep HLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 540 

I I I I I I I I I I I I I II I I I I 1 I M II I I I I M I I It I I I I I I I : I I I I I I I I I I : i I I II 
orf 61-1 HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 540 

orf 61ng-l . pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 

I I I I I I II 1 I I I I I I II I I I I I t t I I I II I I I II I : I I I I : I I I 11 I It 
orf 61-1 VDVriTGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 593 

Based on this analysis, including the homology with the baf protein ofB.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
proteins from Kmeningitidis and N. gonorrhoeae^ and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



45 Example 29 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 241>: 
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1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGaAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGaAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGC. . 



60 This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 
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1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLXVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC. . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 



1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ATGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

851 AATAA 

This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 

1 MFYQILALII WSSSFIA 7\KY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYH WICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTS VSIA AASIJ4CLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVS GLLI SLEPWGVLL 

251 AVLI LGEHLS PVSALGVFW lAATLVAG RL SHQK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical transmembrane protein HI0976 of//, influenzae faccession number 057147) 
ORF62 and ffl0976 show 50% aa identity m 1 14aa overlap: 

Orf62 1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

HI097 6 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

0rf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+-GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
HI0976 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of A^. 
meningitidis: 



10 20 30 40 50 60 

orf 62 .pep MFYQILALIIWSSSFIA AKYVYGGID PALMVGVRLLIAALPAL PACRRHVGKIPREEWKP 

1 1 1 1 i t [ 1 1 1 1 1 1 1 n 1 1 1 1 1 1 M I M 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 62a MFYQILALIIWSSSFIA AKYVYGGID PALMVGVRLLIAALPAL PACRRHVGKIPREEWKP 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 62 .pep L LIVSFVNYVLTLLLQFV GLKYTS AASASVIVGLEPLLMVFV GHFFFNDKARAYH WICGA 
I I I I M ( I I I I I I M I n I I M I I I I I t I i I 1 I I I I I I M I I t I I I I I M I I I I I I I I I I 
orf 62a LLIVSFVNYVLTLLLQFV GLKYTS AASASVIVGLEPLLMVFV GHFFFNDKARAYH WICGA 

70 80 90 100 110 120 



orf 62 .pep 



130 140 150 160 170 180 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVS I A 
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IIIMMMIIIIIIIIIIIiltlllltlllllllllllilllltlMlllllillttll 
orf62a AAFAGVALLMAGGA EEGGEVGW FGCLLVLLAGAGFCAAMR PTQRLIARIGAPAFTSVSIA 
130 140 150 160 170 180 

190 200 210 

orf 62 . pep AASLMCLPFSLALA QSYTVDWSVGMVLSLLYLGLGC 
MIMIMIIIItllltlllttlllllllini:!! 
orf 62a AASI^CLPFSIJUA QSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 

orf 62a SLEPWGVLLAVLI LGEHLSPVSVLGVFWIAATLVAGRLSHQKX 
250 260 270 280 

The complete length 0RF62a nucleotide sequence <SEQ ID 245> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCTGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GAATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TACTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCACT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGCG 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCAG 
GTTCCTGCCA 
CGTGCTGCTG 
TCTTGGGCGT 
TCGCATCAAA 



This encodes a protein having amino acid sequence <SEQ ID 246>: 

1 MFYQILALII WSSSFIAA KY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFV GL KYTS AA5ASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSVLGVFW lAATLVAG RL SHQK* 

ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 

orf 62a . pep MFYQIIALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

I 1 I I t I I I I I t I I I I I I t t I t I I I i I I I I I I I I I I M I M I I I I t M I I i I I I I I I I I t I 
orf 62-1 MFYQILALIIWSSSFIAAKYVYGGIDPAIMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

orf 62a . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

I t I i I I I t t I I I I I I I I I 11 t I I M I I i I I I I I I I I I I I t I I 1 I I I 1 I I I I I I I I I M I I 
orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf 62a . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

I I I I M n I I I I t M I I I I I I I I I I I i I I I I I I t I I I I I I I M I i I I I I I M I I i 1 I I I I 
or f 62- 1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62a , pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 240 

tllllilMIIIIIMiltllllllllllllll:lt:lillllliiniiltllllltll 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 240 

orf 62a . pep SLEPWGVLLAVLILGEHLSPVSVLGVFWIAATLVAGRLSHQKX 285 

lttlMIIIII)MIIII]|lil:MlittillllilllllMlt 
orf62-l SLE P WGVLLAVLI LGEHLS PVS ALGVFWI AAT LVAGRLSHQKX 285 



Homology with a predicted ORF from N.^onorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 
gonorrhoeae: 
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orf 62 , pep MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

]MlltMMI:ttlll)IMtli[illllllllltlllliltllttll!lilill))lt 
orf62ng MFYQIIALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

orf 62 . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVEVGHFFFNDKARAYHWICGA 120 

lllttitlllllllllltllllllltlllllllllliltillllllllllllllllllll 
orf62ng lLIVSFVNYVLTLLLQFVGLECYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf 62 . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCA/iMRPTQRLIARIGAPAFTSVSIA 180 

I i I i I I I i I I I M i I I I 1 i I I I i M I I I i I I I I 1 I I I I I I I I I I I I I I I I I I I I I I M I I 
orf62ng AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62. pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 216 

1 I I I I I i I I I t I I I i I I I I I i I I I I f I I I t I I I I I I 
orf62ng AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 240 

The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGGGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 CCGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGTTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGCGTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGTTG 

751 GCGGTTTTGA TTTTGGGCGA ACATTTATCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CTTTCGCCGC CGGCCGGCTG TCGCGCAGGG 

851 ACGCGCAAAA CGGCAATGCC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 248>: 

1 MFYOILALII WGSSFIA AKY VYGGI DPALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV gwfgcllvll 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANAS GLLI SLEPWGVLL 

251 AVLILGEHLS PVSALGVFW lAATFAAGRL SRRDAQNGNA V* 



ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 

10 20 30 40 50 60 

orf 62ng . pep MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 
I I t I I i I 1 I I I : I t I I I t I i I 1 M I I i I i I i I It t I I I I I I I I i I I I i I I I I I t I I I I I I 
orf 62-1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 62ng . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 
I i t I i M i I M I I I I I I I I I I i I I I I I I I I i I I H I I I I I I I I I I I I I I I I I I I I I I I i I 
orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 62ng . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

I I I I i I I I I M I ! t I I I I I I I I M i I I I I I I ! I I It I I I I I 1 I I i t I I I I M I i I i I I I t 
or f 62 - 1 MFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARI GAPAFTSVS I a 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 62ng . pep AASUICLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 

II I I I I I i II II t I t t t I I I II I I i 11 t II II t III t I I i I t I I I I I I I I I I I I : i I [ i i 

orf 62-1 aaslmclpfslalaqsytvdwsvgmvlsllylglgcgwyaywlwnkgmsrvpanvsglli 

190 200 210 220 230 240 
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250 260 270 280 290 

orf 62ng . pep SLEPWGVLLAVLILGEHLSPVSALGVFWIAATFAAGRLSRRDAQNGNAVX 

I I I I I I I I I t I I I I I 1 I I I I I I I I M I I I I I I I I :: I I I I I : : 
orf 62-1 SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical H.influenzae protein: 



sp|Q57147|Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi | 1074589 | pir | | B64163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574004 (U32778) hypothetical [Haemophilus influenzae] Length = 128 

Score = 106 bits (262), Expect = 2e-22 

Identities - 56/114 (49%), Positives = 68/114 (59%) 

Query: 1 MFYQILALIIWGSSFIAAKYVYGGXDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query; 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 



Based on this analysis, including the homology with the transmembrane protein of H.influenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 30 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GC(3GCAACCG GCAGCACCAG TTCGCTG(5CG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTC^TGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 

251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

351 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

401 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 

451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGC^GAaAa 

601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAGCATA GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 

751 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 

851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT AGO. . 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 

1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

101 GTINSWFGND THEALERSLN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 
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301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 

351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT. . 

Further work revealed the complete nucleotide sequence <SEQ ID 251>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

401 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

451 GGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

1451 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGATGAGCAG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGC GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

1751 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 

1951 AACGCCTTCG AGCCGTATGT AACGGACAAA CCGGCGGGAA CGGGATTGGG 

2001 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS A MLLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ lAKRLS GMFT LVAVLPGVFL FGV SAQFIN6 

101 TINSWFGNDT HEALERSLNL SKSALNLAAD NALGNAVPVQ IDLIGAASLP 

151 GDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QRAGSVRDLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPV PKGVAEDAVL 

251 lEKARAKYAE LSYSKKGLQT FFLAT LLIAS LLSIFXALVM AL YFARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NGNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEQ DAQILTRSTD TIVKQVAALK EMVEAFRNYA 

551 RSPSLKLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLT VAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSETGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIEEHGG RISLSNQDAG GACVRIILPK 

701 TVKTYA* 

Computer analysis of this amino acid sequence gave the foUov/ing results: 
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Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of K 
meningitidis: 



orf 64 .pep 
orf 64a 

orf 64 .pep 
orf 64a 



10 20 30 40 50 60 

MRRFLPIAAICAXXLXXGLTAATGSTSSLA DYFWWIVAFSAM LLLVLSAVLARYVILLLK 
I t I I I I I I I I t t I M I I i I I I I i i I I I I I M I t I t t I M I I I t I I I 1 I I I I I M M 
MRRF LPIAAICAWLLYGLTAATGSTSSLA DYFWWIVAFSAM LLLVLSAVLARYVILLL K 

50 



10 



20 



30 



40 



60 



70 80 90 100 110 120 

DRRDGVFGSXXAKXPXX XMFTLVAXLPGVFLFG FPAQFINGTINSWFGNDTHEALERSLN 
M I I t I I I I II Mini I I I I i I I I It II I I I [ II I II I i I II I I I II I I 

DRRDGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 

70 80 90 100 110 



130 140 150 160 170 180 

orf 64 . pep LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 
I I II I II II M I I II II: I I I I I I I I I I I I I I 1 II I I II t I I i 1 I I I t I t I I I II I I 
orf 64a LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 
120 130 140 150 160 170 



190 200 210 220 230 240 

orf 64. pep KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 
I I I II I I I I I I I I I M I t I t I I: M I I I I I I M I I I I I II I I I II II I I I I M I I I 
orf 64a KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 
180 190 200 210 220 230 



250 260 270 280 290 300 

orf 64 . pep VPKGVAEDAVLIEKjVRAKYAELSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 
I II t I It I I t I M I t I I llllllltlllltltllllltltltllllltllttlMI 
orf 64a VPKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 
240 250 260 270 280 290 



310 320 330 340 350 360 

or f 64 . pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
llllllltllllltllllllllllllllllllll t I I I I I I I I I I I I : I I I I ) II I It t 
orf 64 a ePVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 
300 310 320 330 340 350 



370 380 390 

orf 64 .pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 

I 1 I I I I I I I } I I I I I t I I I t I I I I i I I I I I I I 
orf 64 a aRHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSL 
360 370 380 390 400 410 



orf 64a LAEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQ 
420 430 440 450 460 470 

The complete length ORF64a nucleotide sequence <SEQ ID 253> is: 



1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 

401 GCAACGCCAT CCCCGTGCAG ATAGACNTCA TCGGCGCGGC TTCCCTGCCC 

451 NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CT^ACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 



wo 99/24578 



-183- 



PCT/IB98/01665 



901 CeCGTCCTAT CGCTTGGCGA GGGGGCGAAG GeGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGACATTATC TCGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACNGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

1451 GGGGCGAAGT GGCAAAACGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCTGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGAGCAN GACGCGCAAA TCCTGACACG TTCGACCGAC ACCATCATCA 

1601 AACAAGTGGC GGCATTAAAA GAAATGGTCG AGGCATTCCG CAATTACNCG 

1651 CGTTCCCCTT CGNCTCAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTACG AAGCTGGTCC GTGCCGGTTT GCGGCGGAAC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAGC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACAGT TTGCGACAAC GGCAAGGGGT TCGGCAGGGA AATGCTGCAC 

1951 AATGCCTTCG AGCCGTATGT AACGGACAAA CCGGCTGGAA CGGGATTGNG 

2001 ACTGCCCGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CNCATCAGCC 

2051 TGAGCAATCA GGATGCGGGC GGCGCGTNTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 

This encodes a protein having amino acid sequence <SEQ ID 254>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 lARYVILLLK DRRDGVFGSQ lAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERSLNL SKSALNLAAD NALGNAIPVQ IDXIGAASLP 

151 XDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QQAGSVRDXE SIGGVLYAXG WLSAXTHNGR DYALFFRQPV PKGVAEDAVL 

251 lEBCARAXXXX LSYSKKGLQT FFLAT LLIAS LLSIFLALVM ALY FARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NXNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEX DAQILTRSTD TIIKQVAALK EMVEAFRNYX 

551 RSPSXQLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSEAGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLXLPV VKKIIEEHGG XISLSNQDAG GAXVRIILPK 

701 TVETYA* 

ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 

10 20 30 40 50 60 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
) I I I n I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I ) I I I I M M M I I I M I I I t 
MRRFLPIAAI CAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
10 20 30 40 50 60 

70 80 90 100 110 120 

DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTXNSWFGNDTHEALERSLNL 
t I I I I 1 1 I I I I I I I I I I I I i I i I I t i I t t i I I I I I It I I It I i I I M I N M I I I I I I I i 
DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

SKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIEK 
[tlllllllllllMt:iltll IMIIIt II t 11 I I t I I I I I I II i I I M I I II It I I 
SKSALNLAADNALGNAVPVQIDLXGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 
130 140 150 160 170 180 

190 200 210 220 230 240 

SINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQPV 

lltitltttltltttltlttCllllli lllllllll IIMI I I I I I M I I I II i I I 
SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 
190 200 210 220 230 240 



orf 64a.pep 
orf64-l 

orf 64a .pep 
orf64-l 

orf 64a, pep 
orf64-l 

orf 64a. pep 
orf64-l 



250 260 270 280 290 300 
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10 



15 



20 



25 



30 



35 



40 



45 



orf 64a . pep PKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFIATLLIASLLSIFIALVMALYFARRFVE 
I t I I I i I M I M I I I I I I I I I It I I I I I I I I I i t I M I I t It I I I I I I t I i I I I I t 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFIATLLIASLLSIFLALVMALYFARRFVE 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 64a . pep PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I t I II t I I I II I I I I I I I t I 11 t I i I II I II I t I I II II I ! I II I II I I 1 I t li I II I I I 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 ^ 330 340 350 360 

370 380 390 400 410 420 

orf 64a . pep RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

I II M I I I I II M I I it t I I I I II I I It t I I I I M I I t I I II It i I I It I I I I I It It I t 
orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 64a . pep AEVFAAIGAAAGTDKPVHVBCYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQK 

II I I i I I I I I I II I I t I i 1 t I I I I I t I It I I II I I II II I I I i I I I I I it I I I I I I I II 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64a . pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 
llllllllllltllllllllllllllllllllllllltl llllllllttll:lllllll 
orf 64-1 EAAWGEVAECRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

or f 64a . pep EMVEAFRNYXRSPSXQLENQDIiJALIGDVLALYEAGPCRFAAELAGEPLMMAADTTAMRQ 
I I I I I I I I I lilt : i I I i I I M I I I I I I I I I I I I I I I I I t I I I I I 1 I : I I I I It I I I 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64a . pep VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
I I II I I I I II 11 I I I I f I I t i t t It I : II I I I I II I I I I I II t i I I 1! I I I II II M I I I 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

670 680 690 700 

orf 64a .pep PAGTGLXLPWKKIIEEHGGXISLSNQDAGGAXVRIILPKTVETYAX 
lllltl t I I 1 I I I t 1 I t I I Itillllllll 111111111:1111 
orf 64-1 PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 

670 680 690 700 



50 



Homology with a predicted ORF from N, gonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 
gonorrhoeae: 



55 



60 



65 



orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 -pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
I It t I I I I I I I I I It I I I 11 I I I I I I I I I I I I I : I I I I t t t I I 1 1 I I I I t I t I t I I 
MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 



60 



60 



120 



DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 
111:11111 It lllltl I I I: t t It : I I I t 1 I I I I t I i I I I I 1 I 1 I I I I I I 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 
I I I I 1 I : t I I I I I :: I I I I t I I I I 1 1 : I I 1 I : t I I I I I I I I I t 1 It 1 t I I 1 I I 1 I I I 

LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 17 9 

KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 240 
1 i t I I 1:: I I I : I t : I I : I I : : t I I 1 : 1 I I 1 I I I I I I I t I I i I 1 I I titllllllll 

KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 239 
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orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



VPKGVAEDAVL lEKARAKYAELSY SKKGLQT FFLATLLI AS LLS I FLALVMALY FARRFV 300 

:i::tl:IIIIMI)lllllllliltlltlllll:lllllilll)tllllllllllllll 

I PENVAQDAVLIEKARAKYAELS YSKKGLQT FFLVTLLIASLLS I FLALVMALYFARRFV 299 

EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 360 
thIltlltllMlillMMMIilllllllll IMIItMltlthltillllilM 
EPILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 359 

ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 394 
I i I I t I I I I : I I I I I I I t : I : I 

ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 4 00 



An ORF64ng nucleotide sequence <SEQ ID 25 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 256>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS 

51 LARYVILLL K DRRNGVFGSQ lAKRLS GMFT LVAVLPGLFL 

101 TINSWFGNDT HEALERSLNL 

151 GNMGSVLEHY AGSGFAQLAL 

201 QQTGSVRSLE SIGGVLYAQG 

251 lEKARAKYAE LSYSKKGLQT 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT 

351 ERNRRREEAA RHYLECVLDG LTTGVWSYP LSCCRTAVFS 



A MLLLVLSAV 
FGISAQFING 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



IDLIGTASLS 
LPDKEHWEQl 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
TCHSSPLSYF* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



ATGCGCCGCT 
CGGATTGACG 
GGTGGATAGT 
TTGGCACGTT 
CGGTTCGCAG 
TACTGCCCGG 
ACGATTAATT 
CCTTAATTTG 
GCAACGCCGT 
GGCAATATGG 
GCTTGCCCTG 
CGCACCAATT 
CAGCAGACCG 
CGCGCAGGGA 
TGTTCTTCCG 
ATTGAAAAGG 
TTTGCAGACC 
TTTTTCTTGC 
CCCATTCTGT 
CAGCCAGACG 
AGCTGTTCAA 
GAACGCAACC 
GTTGGATGGG 
TGAAAACCTT 
CCCCTGTGGG 
GTCCCTGCTT 
ACAAACCGGT 
CTGGGCAAGG 
GGTGATTGAC 
GGGGTGAAGT 
CCCATCCAGC 
GGACGATCAG 
AACAGgtggc 
CGCGCCCCTT 
CGATGTTTTG 
TTGCCGGCGA 
GTGCTGCACA 
TATGCCCGAA 
TCCTGACGGT 
AATGCTTTCG 
TCTGCCTGTA 
TGAGCAATCA 
ACGGTAGAAA 



TCCTACCGAT 
GCGGCGACCG 
CTCGTTCAGC 
ATGTCATATT 
ATTGCCAAAC 
CTTGTTCCTG 
CGTGGTTCGG 
AGCAAGTCCG 
TCCCGTACAG 
GCAGTGTGCT 
TACAATGCCG 
CGACCAGCCG 
GTTCGGTTCG 
TGGTTGTCGG 
CCAGCCGATT 
CGCGGGCGAA 
TTTTTTCTGG 
GCTGGTAATG 
CGCTTGCCGA 
CGCCCCGTAT 
CCATATGACC 
GCCGGCGCGA 
TTGACTACCG 
CAACAAGGCG 
GCAGCAGCCG 
GCCGAAGTGT 
CCAGGTGGAA 
CGACGGTATT 
GACATCACCG 
GGCGAAGCGG 
TTTCCGCCGA 
GACGCGCAAA 
gGCGTTAAAA 
CGCTCAAACT 
GCCCTGTACG 
ACCGCTGATG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGT 
CTTATGCGTA 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
GCCTTTCCGG 
TTCGGCATTT 
CAACGACACC 
CACTGGATTT 
ATAGACCTCA 
GGAACACTAC 
CAAGCGGGAA 
CTTCCCGACA 
GAGTTTGGAA 
CAGGTACGCA 
CCCGAAAATG 
ATATGCCGAA 
TAACCCTGCT 
GCACTGTATT 
GGGCGCAAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GTGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TtgccgccAT 
TATGCCGCGC 
GCCCGAAGAC 
TGCTGATACG 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACGCG 
GAAATGGTCG 
GGAAAATCAG 
AAGCCGGCCC 
ATGGCGGCGG 
AAATGCCGCC 
AATCGGAAAC 
GGCAAGGGAT 
GACGGATAAG 
TCATTGGAGA 
GGGGCGTGTG 
G 



TGCGCCGTCG 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCA 
GATGTTCACG 
CCGCGCAGTT 
CACGAAGCCC 
GGCGGCAGAC 
TCGGCACCGC 
GCCGGCAGCG 
AATCGAAAAA 
AAGAACATTG 
AGCATAGGCG 
CAACGGGCGC 
TGGCACAGGA 
TTGAGTTACA 
GATTGCCTCG 
TTGCCCGCCG 
GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
CGTCACTACC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGTGCGGCG 
CGGACGATGC 
AACGGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TtcgACCGAC 
AGGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAAGGA 
CCGGCGGGAA 
ACACGGCGGC 
TCAGAATCAT 



TCCTGCTGTA 
GATTATTTCT 
GTCCGCCGTT 
ACGGCGTGTT 
CTGGTCGCCG 
TATCAACGGC 
TCGAACGCAG 
AATGCCGTCA 
CTCCCTGTCG 
GTTTTGCCCA 
AGCATCAATC 
GGAACAGATT 
GCGTATTGTA 
GATTACGCGC 
TGCCGTTCTG 
GCAAAAAAGG 
CTGCTGTCGA 
TTTCGTCGAA 
AGGGTGATTT 
CGTTTGACCA 
AGAAGCAGAC 
TCGAGTGCGT 
AAAGGCCGTT 
GCCGCTCGCC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTGAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCATCA 
CAATTACGCG 
CCTTAATCGG 
GAGGCGGAAC 
CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGACTGGG 
CGCATCAGCC 
CTTGCCAAAA 
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This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS A MLLLVLSAV 

51 LARYVILLL K DRRNGVFGSQ lAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALERSLNL SKSALDLAAD NAVSNAVPVQ IDLIGT7VSLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

251 lEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGVWFDE KGRLKTFNKA AEQII*GMPLA 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVQVE YAAPDDAKIL 

451 LGKATVLPED NGNGWMVID DITVLIRAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDDQ DAQILTRSTD TIIKQVAALK EMVEAFRNYA 

551 RAPSLBCLENQ DLNALIGDVL ALYEAGPCRF EAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADMPE VRVKSETGQD GRIVLTVCDN GKGFGKEMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIGEHGG RISLSNQDAG GACVRIILPK 

701 TVETYA* 

ORF64ng-l and ORF64-1 show 93.8% identity m 706 aa overly: 

10 20 30 40 50 60 

orf 64ng-l . pep MRRFLPIAAI CAWLLYGLTAATGSTSSIADYFWWIVSFSAMLLLVLSAVLARYVILLLK 
I I I I 1) [ t I t I I I t 1 I I [ I t I I I M I I t I I I I I I I I I : I I I I M I I I I I I I I I M I I t I I 
orf 64-1 MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf64ng-l. pep DRRNGVFGSQI AKRLSGMFTLVAVLPGLFLFGI S AQFINGT INSWFGNDTHEALERSLNL 
il|:|)llll)lllllllllllllllt:ini:llllilllltllintlllMllllil 
orf 64-1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 64ng-l . pep SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVUEHYAGSGFAQLAL YNAASGKIEK 
I M I I : t M I 1) : : ] [ M I I I I I I i : I I I I: i I I I I I I I I I I I I I I M I I I I I t I I I I 
orf 64-1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNAASGKIEK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf64ng-l.pep SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 

i : I t : I t : : I I I I : I t I It I t I t I I I I t t I t I I I I I I I I I I M I I : 
orf 64-1 SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 64ng-l . pep PENVAQDAVL lEKARAKYAELSYSKKGLQTFFLVTLLIASLLS I FLALVMALYFARRFVE 
I :: I I : I I I I 1 I I I I I I I I I t I i i i I I I I I I i I : I I I I i I I I I I I I M I t I I I I I I I I i I 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf64ng-l.pep PILSLAEGAKAVAQGDFSQTRPVLEWDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I : M I I i I I I I I I 1 t I M I I I i I t M I I I n I I t I I I I I i I I i I It I 1 M I I I I t I M I I 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 64ng-l . pep RHYLECVLDGLTTGWVFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 
t t I I 1 t i I : t t I I I I i t t t t : t t I I I t I I t I t t t t II i : I t I t I I t I t t I t t t t t I I t I 
orf 64-1 RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 64ng-l . pep AEVFAAIGAAAGTDKPVQVEYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIRAQK 
ttlttlttllllttltl:t:titlltttllllltttlllt1ttttlttltlttttt:|lt 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64ng-l . pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTRSTDTIIKQVAALK 
I It I I I I I i t t I t t I I I I t I I I I II t It I I II t t t II t : I i t t 1 t II t I t ! t : t I t I I t I 
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orf64-l EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 64ng-l . pep EMVEAFRNyARAPSIJaENQDLNALIGDVIJVLYEAGPCRFEM:iJVGEPIJ4MAADTTAMRQ 
I I t i [ I I I I I I : I I I I I It I I I I I I I I I I I M I i I t I I i I I I I I I I I I : I I I I M I I I 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64ng-l . pep VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGPGKEMLHNAFEPYVTDK 
||||llllltllltlll:MMIIIIIIIIIIIIIIIIIII)lll:ltlillllll)lli 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

670 680 690 700 

orf 64ng-l . pep PAGTGLGLPWKKIIGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
t I I i ) I t I I I I I I 1 I lilllllllllllltl[lllllll)l:|lil 
orf 64-1 PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 

670 680 690 700 

Furthermore, ORF64ng-l shows significant homology to a protein from Acaulinodans: 

sp|Q04850|NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY >gi | 77479 | pir | | S18624 ntrY 
protein - Azorhizobium caulinodans >gi|38737 (X63841) NtrY gene product 
[Azorhizobiiim caulinodans] Length = 771 
Score = 218 bits (550), Expect = 7e-56 

Identities = 195/720 (27%), Positives = 320/720 (44%), Gaps = 58/720 (8%) 

Query: 7 lAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 

I+A+ ++L GLT + + + R++KRG 

Sbjct: 35 ISALATFLILMGLTPWPTHQWIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 

Query: 67 FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

+++ R+ G+F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 
Sbjct: 91 AAARLHIRIVGLFAWSWPAILVAWASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 150 

Query: 127 LAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAG— SGFAQLALYNAASGKIEKSINP 184 

AN+ + +DL S+ YGSFQ+ AA+++ 

Sbjct: 151 EHALNIRGDILAMSADLTRLKSV YEGDRSRFNQILTAQAALRNLPGAMLI 200 

Query: 185 HQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYA 233 

+ D++++ 1+ V + -HIG Q + N DY 

Sbjct: 201 RR-DLSWERAN-VNIGREFIVPANLAIGDATPDQPVIYLP— NDADYVAAWPLKDYDD 256 

Query: 234 —LFFRQPIPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTXXXXXXXXXXXXXVMA 291 

L++IV ++AYL+ G+Q F + + 

Sbjct: 257 LYLYVARLIDPRVIGYLKTTQETLADYRSLEERRFGVQVAFALMYAVITLIVLLSAVWLG 316 

Query: 292 LYFARRFVEPILSLAEGAKAVAQGDFSQTRPVLRND-EFGRLTKLFNHMTEQLSIXXXXX 350 

L F++ V PI L A VA+G+ P+ R + + L + FN MT +L 

Sbjct: 317 LNFSKWLVAPIRRLMSAADHVAEGNLDVRVPIYRAEGDLASLAETFNKMTHELRSQREAI 376 

Query: 351 XXXXXXXXXXXHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGW 410 

+ E VL G+ GV+ D + R+ N++AE++LG L+ + RH 
Sbjct: 377 LTARDQIDSRRRFTEAVLSGVGAGVIGLDSQERITILNRSAERLLG— LSEVEALHRHLA 434 

Query: 411 HGVSAQQSLLAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 467 

V LL E + VQ D + + V E + +G V+ 

Sbjct: 435 EWPETAGLLEEA EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWW 488 

Query: 4 68 VIDDITVLIRAQKEAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTR 527 

+DDIT LI AQ+ +AW +VA+R+AHEH-NPLTPIQLSAERL KG + QD +1 + 
Sbjct: 489 TLDDITELISAQRTSAWADVARRIAHEIKNPLTPIQLSAERLKRKFGRHV-TQDREIFDQ 547 

Query: 528 STDTIIKQVAALKEMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGE 587 

TDTII+QV + MV+ F ++AR P +++QD++ +1 + L G + 
Sbjct: 548 CTDTIIRQVGDIGRMVDEFSSFARMPKPWDSQDMSEIIRQTVFLMRVGHPEWFDSEVP 607 

Query: 588 PLMMAA-DTTAMRQVLHNIFKNXXXXXXXXDMPEVRVK SETGQDGRIVLTVCD 639 

PMA D +QLNIKN P+VR + + G+D •fV+ + D 

Sbjct: 608 PAMPARFDRRLVSQALTNILKNAAEAIEAVP-PDVRGQGRIRVSANRVGED— LVIDIID 664 
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Query: 640 NGKGEXSKEMLHNAFEPYVTDKPAGTGLGLPWKKlIGEHGGRISLSNQDAG-GACVRIlL 698 

NG G +E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
Sbjct: 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N. gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 31 

The following partial DNA sequence was identified m N.meningitidis <SEQ ID 259>; 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG. . . 

This corresponds to the amino acid sequence <SEQ ID 260; ORF66>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIEX3I HTTWGAFSFP 
51 FIFLATDLTV RIFGSHLARR IIFV?VMFPAL LLSYVFSVLF HNGSWTGLGA 
101 LSEFNTFVGR lALASFAAYA IGQILDIFVF NKLRRLKAWW lAPNASTVIG 
151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 26 1>: 



1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

451 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 MYAFTAACXX3 KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSEFNTFVGR l ALASFAAYA IGQILDIFV F NKLRRLKAWW lAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

Computer analysis of this amino acid sequence gave the following results: 

Homoloev with the hypothetical protein o221 of £. coli (accession number P37619) 
ORF66 and o221 protein show 67% aa identity in 155aa overlap: 
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orf66 1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

M F+ Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFIATDLTV 
o221 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

orf66 61 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
o221 61 RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 



orf66 121 IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
o221 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 



Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 66 .pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFS FPFIFIATDLTV 
IIIIMIIIIMII M I I I I I I 1 I I I I I I I I I i I I I lllllllilllillllllllll 
orf 66a MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFS FPFIFLATDLTV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 66. pep RIFGSHLAR RIIFWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 
I I I I I I I I I I 1 M It I I M I I I I I 1 i I I I I I I I I i I i 1 M I I I M I I I M t I I I M [ I t t 
orf 66a RI FGSHLARR I I FWVMFPALLLS YVFS V LFHNG SWTGLGALSEFNT FVGRI ALAS FAAYA 

70 80 90 100 110 120 



130 140 150 

orf 66 .pep IGQILDIF VFNKLRRLKAWWIAPNAS TVIGHALEyr 
: I I I I t 1 I I I I I M I I I I M : I I : I I I i I I : I I I I 
orf 66a LGQILDIFV FHKLRRLKAWWVAPTAS TVIGNALDTLVFFAVAF YASSD6FMAANWQGIAF 

130 140 150 160 170 180 



orf 66a VDYLFECLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 



1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSEFNTFVGR l ALASFAAYA LGQILDIFV F NKLRRLKAWW VAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

201 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 



10 20 30 40 50 60 

orf 66a . pep MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFSFPFIFLATDLTV 
I I I I t M I I I I t M I 1 I I I I I I I M I t I I I I i M I I I I I t M M t 1 I t I I M I M t I I 
orf 66-1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orf 66a . pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
I I I i I I I i I I I I I I I I I I I I i t I t I I I I M I I I I I I I I i I t I I i I I I I I M I I I I I I I I I 
orf 66-1 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 66a . pep LGQILDIFVFNKLRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
:[ I I I I M I I I i I I t I t I I I : t I t I I I I I M I It I I I I i I I I I I I I I I I I I I I I I I i M I 
orf 66-1 IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
130 140 150 160 170 180 



190 200 210 220 229 

orf 66a . pep VDYLFKLTVCGLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 
I t I I I I I I I I I I I I I I t I I I t I I I I I I I I t I I I I I I I I I I I I I I I I I I 
orf 66-1 VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 

190 200 210 220 



Homology with a predicted ORF from Kgonorrhoeae 

ORF66shows 94.2% identity over a 155aa overly with a predicted ORF (ORF66.ng) from K 
gonorrhoeae: 

orf 66. pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

||I:|||ltlllll[ttl[tttllll|[[lllllt:|lliMlltltltllltlllllll 
orf66ng MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

orf 66. pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

I I [ I M I I I I I It I t t I I t I I I I t I t I I I i I I II t I I t t I: I t I 1 t i I I I t t t i t I I I 
orf66ng RXFGSHLARRIIFWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 120 

orf 66. pep IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

:ltllllltl:tltlllllllll ltl[l|:ltll 
orf 66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 



1 ATGTACGCAT TGACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTCCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCGGAT TTTCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCGCGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT ttgCTTTcat 

251 aCGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 ctgTCCCAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTCGTATTC GACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCCCCGG CCGCATCAAC CGTCATCGGC 

451 AATGCACTGG ACACGTTAGT ATTTTTTGCC GTTGCCTTTT ACGCAAGCAG 

501 CGATGAATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACG GCCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGTGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 266>: 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTW GAFSFP 

51 FIFLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 

101 PS QFNTFVGR lALASFAAYA LGQILDIFVF DKLRRLKAWW lAPA ASTVIG 

151 NALDTLVFFA VA FYASSDEF MAANWQGI AF VDYLFKLTVC T LFFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSQFNTFVGR l ALASFAAYA LGQILDIFV F DKLRRLKAWW lAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLL TKKLT ALQTKQAQDR PVPSLQNP* 
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ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 



10 



15 



orf 66-1 , pep 
orf 66ng 
orf 66-1. pep 
orf 66ng 
orf 66-1, pep 
orf 66ng 
orf 66-1. pep 



MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIEX;iHTTWGAFSFPFIFLATDLTV 
I I I : t t i I I I 1 I I t I t I t I I I ) I I I I t I I I I I I i I : I I I I t I I t I t I I I I I t I I t t I I I I 
MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 



60 



60 



RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 
ltniltl)ltlllllllMllltlMlllillllllllll|:|||||iltlililllll 
RI FGSHIARRI I FWVMFPALLLS YVFS VLFHNGSWTGLGALSQFNTFVGRI ALAS FAAYA 120 



IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASS[)GEMAANWQGIAF 
: I I I I I I I I I :[ t I I I I I I t M I : I I I I I t I I t I I I I I I I I I 1 I I I I [ I I I I t i 1 t I I I 
LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEEMAANWQGIAF 

VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 229 
t I I I I I M i t I I I t I I I i M I I I I I I I I I I : t I I I t I I I I [ : I [ M t i I 
VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 229 



180 



180 



orf 66ng 

Furthermore, ORF66ng shows significant homology with an E.coli ORF 



sp|P37619|YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 
REGION (0221) 

20 >gi| 1073495|pir I IS47690 hypothetical protein o221 - Escherichia coli >gi I 466607 

(U00039) No definition line found [Escherichia coli] >gi 11789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
Length =221 
Score - 273 bits (692), Expect = 5e-73 

25 Identities = 132/203 (65%), Positives = 155/203 (76%) 



Query: 


1 


MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 


60 






M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 




Sbjct: 


1 


MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 


60 


Query: 


61 


RI FGSHLARRI I FWVMFPALLLS YVFSVLFHNGSWTGLGALSQFNT FVGRIALAS FAAYA 


120 






RIFG-f LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 




Sbjct: 


61 


RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 


120 


Query: 


121 


LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 


180 






LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W lA 




Sbjct: 


121 


LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 


180 


Query: 


181 


VDYLFKLTVCTLFFLPAYGVILN 203 








VDY FK+ + +FFLP YGV+LN 




Sbjct: 


181 


VDYCFKVLISIVFFLPMYGVLLN 203 





Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
45 N.meningitidis and N.gonorrhoeaey and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 32 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 267>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

50 51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTTVAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

55 301 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 

351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 

451 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 



wo 99/24578 



-192- 



PCT/IB98/01665 



501 TGGCTGCTAC GGCGTTGAT. . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence <SEQ ED 270; ORF72-l>: 



1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A ofN, 
meningitidis: 

10 20 30 40 50 60 

orf 72 . pep MVIKYTNLNFAKLSIIAILMMYSFEANAN AVXISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I 1 I I t I [ I 1 M I I I I I I M I I I I I i I I I t I 1 1 I I I I I I I I I I I I I I I I I I I I I 1 I I I I 
orf 72a MVIKYTNLN FAKLS I lAI LMMYS FEANA NAVKI SETVSVDTGQGAKI HKFVPKNSKTY5 S 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 7 2 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I t I 1 I 1 I I I t I I I M I I I I I ! M I t I I 1 I M I I I I t I I I 1 n I t I 1 I I I t I I I I I I I M 
orf 72a DLIKTVDLTH I PTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 



130 140 150 160 170 

orf 7 2. pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

I I t i I I t I t I I I I i I I I I I I M I i i : I 
orf 72a HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

The complete length ORF72a nucleotide sequence <SEQ ID 271 > is: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTTWW^TAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGPUVA GGTCTCAGGC 

451 TAA 

This encodes a protein having amino acid sequence <SEQ ID 272>: 
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1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 

10 20 30 40 50 60 

MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I i I M 1 I I I t I I I I [ I I I I i 1 I I I I I I I i I I I i 1 i I I I I I I I I i I I I I I ) I I I I I I I I t 
MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
lllllllllllllllltllttllllllllllllllllllilllillllllllllllltll 
DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 

130 140 150 

HDVYETFKEDIQARGYQYDPETDKF/VKVSGX 

I I I I I I i I I M I I I I I i I I I I I I I t M I I I I 
HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

Homology with a predicted ORF from N.gonorrhoeae 

ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from A^. 
gonorrhoeae: 

orf 72 . pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

|[ |:||]ll)l)tMllllliltllllll) lll|:Mtlllil|:|iill|:t: IN 
orf72ng MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 60 

orf 72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II |:|llll lllllllllllltlllMIIII:|lllt:l I I I I : I I I t I II M I I I ) 
orf72ng DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 120 

or f 7 2 . pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 

I I I II II I I I I I II t : I I ) I I M I t II ) II : I i I I I II : I t I I I I I M I I I I 
orf72ng HDVYETFKEDIQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA ATFGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKIRFAVLLA FIIMSAFWF GSLGGE* 



After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 



orf 72a. pep 
orf72-l 

orf 72a. pep 
orf72-l 

orf 72a. pep 
orf72-l 
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This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 

ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

orf 72ng-l .pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 
II I : M I I i I I I I I I I I I i I I i I I I I i I I I I I I I I : I I 1 I I I t I I : i I M I I : t : ill 
orf 72-1 MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72ng-l .pe DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKEX3TRAVPYVGTALLA 
II I : I I II I II I II II I II I I II I I I I I I I I I : II II I : I I I I I : I I I I I I I I I I I i I 
or f 7 2 - 1 DLIKTVDLTHI PTGAKARINAKITAS VSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 
orf 72ng-l .pe HDVYETFKEDIQARGCRYDPETDKF 
I I I I I I I I i I I I I I I : I I I II t I I 
orf 72-1 HDVYETF1CEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
KgonorrhoeaBy and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 277>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 gCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC. . 

This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGTU^AAGA GGGCTTTTCC CGCGATGACG 

401 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 
51 LSGLLLAGAA MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 
101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 
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151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmenineitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 73 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAA GVLMLRQTGLTGLLLAGAA 
I I I I I I I M It I t I I I I ) I t I 1 I I i I I t I I t I I I tl I I I I i t : i M : I I i : I I I M I I i 
orf 7 3a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAA GWMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 

or f 7 3 . pep MRSGGKVS VYQMLWP I 
I I t t I : I t I I Ml i 

orf 73a MRSGGRVSVYXMLWXIRYTVAAV CXMSPGFVSSVXAVLLXL PFKGGAVLQAGGAENFFNM 

The complete length ORF73a nucleotide sequence <SEQ ID 281> is: 



1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

451 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ID 282>: 



1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGWMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 



10 20 30 40 50 60 

orf 73a . pep MRFFGIGFLVLLFLEIMSIVWADWLGGGWTLFLMAATFAAGVVMLRHTGLSGLLLAGAA 
I I II I I I I t I t I II I I I I I I I II li I I II M II II t I I I li t : II I I I I I I I I II I I I I 
O r f 7 3 - 1 MRFFG IGFLVLLFLE IMS I VWVADWLGGGWTLFLMAAGFAAGVLMLRHTG LSGLLLAGAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 73a . pep MRSGGRVSVYXMLWXIRYTVAAVCXMSPGFVSSVXAVLLXLPFKGGAVLQAGGAENFFNM 
t I I I M I I I I III I I I I I li II II I II I I I I MM II II I I I I M M I II I I I t I 
or f 7 3- 1 MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 



130 140 150 160 

orf 73a . pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
I Mil I I M I II I M I I li I I I I I I III M II I I I 
orf 73-1 NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 

130 140 150 160 



Homology with a predicted ORF from N.sonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 



orf 73 .pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 60 
I I M II I I II M I M I II I I I I II M II M I I II M I II II I I I II : I II : 11 I I II M 
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orf73ng ^ffiFFGIGFLVLLFLEIMSI^WVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 60 

orfVB.pep MRSGGKVSVYQMLWPI 76 

orf73ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 120 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 



1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

10 151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

15 401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

451 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 



1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 LSGLLLAGAA VKSSGKVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

20 101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 



10 20 30 40 50 60 

or f 73-1 . pep ^mFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFL^^AAGFAAGVLMLRHTGLSGLLLAGAA 
25 I I I ) i I I I I I I t I I I i I I I I I t I t I t I I I I I I I I I i t I t I I M I I I I I I I I I I I It I i I 

orf73ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 



30 



70 80 90 100 110 120 

orf 73-1 . pep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
:: I : I : I I I I M I i t M I I t I I I M I I I I I I I I I I I I I I I I I I i I I I I I I M I I I I I I I I 
orf73ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 



35 



130 140 150 160 

orf 73-1 . pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
tlllMIII :l)lltilliMt:| I I I t I I I I I I I : M 1 I 
orf73ng NQSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 

130 140 150 160 



40 Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
^meningitidis and N, gonorrhoeae^ and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 34 

45 The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 285>: 

1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

50 201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

351 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

401 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

55 451 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 
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501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC 7VAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT. . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 



1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KXTGEGKKAL YD.. 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATGTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGA5AVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSVfKN K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF fix>m Kmeningitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 7 5 . pep MFVFQTAFXMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKAXXXXAEDTR 

i t i I I I I I I I I I I I I I I I I I t I t I i I I I It I i 1 i 1 I I I I I t i i I I t i 

orf 75a MFQKHLQKAS DSWGGTLYWATP I GNLADITLRALAVLQKADII CAE DTR 

10 20 30 40 50 



70 80 90 100 110 120 

orf 75. pep VTAQLLSAYG I QGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
IMMIMIMtllMMIillltlltllllllllllllMIIMIIIIIillltllMI 
orf 75a VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
60 70 80 90 100 110 



orf75.pep 



130 140 150 160 170 180 

RVREAGFKWPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 
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I I I I : I I I I I I I i I I II I II Ml I II I I I I I II II I I I n I I I I I II I t M I: I II: I 
orf75a RVREVGF KWPWGASAVMAALSVA GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPW 
120 130 140 150 160 170 



190 200 210 220 230 240 

or f 7 5 . pep MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 
II I I i I I I I I : I I I II I I I I I I II I I II I I II I I I I I II II I I I II t M : I I I : I I I I I I 
orf75a MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 
180 190 200 210 220 230 



250 260 270 280 290 

orf 75 . pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGECKALYD 
II t t I t I I I 1 I I I I t I I I t I I II II II II II t I I I II I I I I II i I II I I I I I 
orf 75a VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 
240 250 260 270 280 290 



orf75a X 

The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGCGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATGTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGTCGG 

351 GTTTAAAGTT GTCCCTGTTG TCGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GTGTGGCTGG TGTGGCGGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGTGGC 

501 GTTTCCCGTC GTGATGTTTG AAACGCCGCA CCGCATCGGG GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCC GCCAAAATCA CGGGCGAGGG AAAAAAAGCT TTGTACGATC 

851 TGGCACTGTC TTGGAAAAAC AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 290>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK XVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREVGF KV VPWGASAVM AALSVA GVAG SDFYFNGFVP 

151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 



10 20 30 40 50 60 

orf 75a. pep MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
I I I I II II I I t I I I I I I I I i I II I I I I I M I I I I I I I I i It I M I I I I I I I I I I I I I t II 
orf 75-1 MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 5a . pep GIQGKLVSVREHNERQMADKIVGYLSEXBMWAQVSDAGTPAVCDPGAKLARRVREVGFKV 
I I I I I I I I I I I I M I I I I I I I II I II I I t II I I I I I i I I II I I I I t I 1 I I 1 I I I I : I I M 
orf 75-1 GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 75a . pep VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIG 
I I I i I I I I I I I I I t I t I I I I I I I I I I I ! I I I I I I 1 I I I I I I I I I : I I I : I t I I I I I I I I 
orf 75-1 VPWGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 7 5a. pep m ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
I t I I I I I I I I I I I I I ) I I t I I t I t I M I I I I II I I I I I II : I I I 1 I I I I t I I i I I I I I I I 
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orf75-l ATIADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
190 200 210 220 230 240 

250 260 270 280 290 

or f 75a . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
I I I I t I I I I I I i I I I I I i I I I I I I I i I I I I I I I I I I I t I I I I I I I t I I I I I I 
orf75-l EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Homology with a predicted ORF from N.sonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from N. 
gonorrhoeae: 

orf75.pep 
orf75ng 
or f 7 5. pep 
orf75ng 
or f 7 5. pep 
orf 75ng 
or f 7 5. pep 
orf75ng 
orf 75. pep 



MFVFQTAFXMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKA AEDTR 56 

[ 1 1 1 1 1 1 1 1 1 It 1 1 1 1 1 1 M 1 1 1 1 1 1 i 1 1 i 1 1 1 1 i 1 1 1 1 1 n M I I I I I I I I I I 

MSVFQTAFFMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 60 

VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 116 
|||||tlllllll:tlllll[lililill::|:illl:liliMlilinillllllitl 

VTAQLLSAYGIQGRLVSVREHNERC^DKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 120 

RVREAGFKWPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 176 

t I i t I i I I I I I I I i I f I I i I I I I 1) I I t M I I t I M I I I I I I i I I I I I I I I I I [ I : I 

RVREAGFKWPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPW 180 

MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 236 
||||IIMII:|llltlllllMllllllllttlllllilltlltllll:lll:|llltl 

MFETPHRIGATUVDMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 240 

VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 288 
t I I I I I t n I I I I I I I 1 I I I t I I I I : I I It I i I I M I I M I I 1 t I I I I I I i 

VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAEIJ^ITGEGKKALYDLALSWKNK 300 



orf75ng 

An ORF75ng nucleotide sequence <SEQ H) 291> was predicted to encode a protein having amino 



acid sequence <SEQ ID 292>: 



1 MSVFQTAFFM 

51 ADIICAEDTR 

101 AQVSDAGTPA 

151 DFYFNGFVPP 

201 ERRLMLAREI 

251 KHEGLSESAQ 

301 * 



FQKHLQKASD 
VTAQLLSAYG 
VCDPGAKLAR 
KSGERRKLFA 
TKTFETFLSG 
NAMKILAAEL 



SWGGTLYW 
IQGRLVSVRE 
RVREAGFKW 
KWVRAAFPW 
TVGEIQTALA 
PTKQAAELAA 



ATPIGNLADI 
HNERQMADKV 
PWGASAVMA 



TLRALAVLQK 
IGFLSDGLW 
ALSVAGVAES 



MFETPHRIGA 
ADGNQSRGEM 
KITGEGKKAL 



TLADMAELFP 
VLVLYPAQDE 
YDLALSWKNK 



After fiirther analysis, the following gonococcal DNA sequence <SEQ ED 293> was identified: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 



This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>; 
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1 MFQKHLQKAS 

51 RVTAQLLSAY 

101 AVCDPGAKLA 

151 PKSGERRKLF 

201 ITKTFETFLS 

251 QNAMKILAAE 



DSWGGTLYV 
GIQGRLVSVR 
RRVREAGFKV_ 
AKWVRAAFPV 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPWGASAVM 



ITLRALAVLQ 
VIGFLSDGLV 
AALSVAGVAE 



VMFETPHRIG 
AADGNQSRGE 
AKITGEGKKA 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



KADIICAEDT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overly: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 75-1 . pep MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
I I I I I I I I I I I I I I I I I I t I t I I I M I I I i I I I I t I I I t I I M I I I I t I I I I i I I t I I I I 
orf75ng-l MFQKHLQKAS DSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 5- 1 . pep GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
llli:||||liiltl)llll::|:llll:IIIMIIMtllllllllltlll)IIIIMI 
orf75ng-l GIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 75-1 . pep V P WG AS AVMAALS VAGVEGSD FY FNGFVPPKSGERRKLFAKWVRAAFP I VMFETPHRIG 
t I I I I i I I I I I I I t I I I t I I I I I t i I I I I I I M I I I [ I I I I I I I I I t : I I I I M I M I 
orf75ng-l VPWGASAVMAALSVAGVAESD FY FNGFVPPKSGERRKLFAKWVRAAFP WMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75-1 . pep ATLADMAELFPERRLMLAREITIO'FETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
I i I I I I i I I I M t I M I i I I M I I I I I I I I I I I I I M I M : I I I I I I I i I I t I [ t I I t I I 
orf75ng-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

orf 75-1 . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
I I I M t I I I I [ t I I M : I I I I I I I I I t I I ) 1 I I I I M [ I I I i I I i I I t I I t 
orf75ng-l EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

splP45528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
CF286) 

>gi 1 606086 (018997) 0RF_f286 [Escherichia coli) 

>gi I 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length =286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

Query: 64 GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 179 

Query: 184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from Kmeningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 35 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GC.AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV lAAMLAGFAA XKAPEIDPAL 

// 

201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGMGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 



1 MKQKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 
51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 
101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 
151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 
201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 
251 KP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menineitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 3 laa overlap with an 
ORF (ORF76a) from strain A of M meningitidis: 



10 20 30 

or f 7 6. pep MKQKKTAAAVIAAMLAGFAAXKA PE I DPAL 
I i I t I I I I I I I I I I I I I I I I I t I I I I I I I 
orf76a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
10 20 30 40 50 60 

// 

70 80 90 
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orf 7 6 . pep XELVRNQLEQGLRQEKARLKIDALLEENGVKPX 

I I I t t I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orf 7 6a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGA/^CGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 300>: 

1 MKQKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 

10 20 30 40 50 60 

MKQBOCTAAAVIAAMLAGFAAAJCAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
i I I I I i i I I i t I I t M I i M I t I n t t M I I I t I I I I I I I I i M I I I I i I I I I I I I I I I I 
MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
10 20 30 40 50 60 

70 80 90 100 110 120 

AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 
IIIMMIIIIIIIMIMIIillMlllllllllltllliMIIIIIIIIMI: 
AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 
70 80 90 100 110 120 

130 140 150 160 170 180 

YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
I I : I I I I t I I t I t I I I t t 1 I 1 I I I I I I I t I I I I I I I I I I t I I I i I I t I I I N I I I I I I I I 
YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
130 140 150 160 170 180 

190 200 210 220 230 240 

LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
[ I i t I M I M I I I I I It I I I I I t I i t It I I I I I i I M I I I t I I It I I t I I I I i I I I I I I i 
LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
190 200 210 220 230 240 

250 

IDAILEENGVKPX 
1 1 I: I I 1 1 II t I 1 
IDALLEENGVKPX 
250 

Homology with a predicted ORF from N.sonorrhoeae 

The aUgned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N, gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 



orf 7 6a. pep 
orf 7 6-1 

orf 76a. pep 
orf76-l 

orf 7 6a. pep 
orf76-l 

orf 7 6a .pep 
orf76-l 

orf 7 6a. pep 
orf76-l 
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orf7 6.pep MKQKKTAAAVIAAMLAGFAAXKAPEIDPAL . ... 30 

I I I I I I I I I I M I I I I I I I I I I I I I t I I I 
orf 7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 60 

// 

orf 7 6 . pep ELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

t I I I I I I I M I I M t I I t I i I I I I t I I I i M 
orf76ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

The complete length ORF76ng nucleotide sequence <SEQ ID 301> is: 



1 ATGAAACAGA AAAAGACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AGACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 GTTCGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTcgc 

551 agtttgCCGG TATGAACCGT GGCGACGTTA CCCGCAATCC GGTCAAATTG 

601 GGCGAACGCT ATTACCTGTT C7WVCTCGGC GCGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGGC 

701 AGGA/y^GC CCGCTTGAAA ATCGATGCCC TTTTGGAaga Aaacggtgtc 

751 AaacCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 302>: 



1 MKQKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIJCL QQVSFATEEE ARQACX2LLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR CajVTRNPVKL 

201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 



10 20 30 40 50 60 

orf 76-1. pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
I I I I I I I I M I I I M I I I I I I I I I t I I I M I I I I I I I I I I I I I I I t I I I I : i I I [ I M I I 
or f 7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHT^QSQRPDGQAIRND 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 6-1. pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 
I t I I I I I I I I I I I It I I I I I I I M t M I I I I I I I I I I I I I I M I tl t I I I I I I I : I:: 1 
orf76ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 76-1 . pep YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
I t: t I I I t [ I I I t i I M I I I I I I I I I I I t I I I I I I I I I I I I I I [ I [ 1 M I I I I I I M M I 
orf76ng YE RQIRM I KLQQVSFATEEEARQAQQLLLKGLSFEGLMKR YPNDEQAFDG FIMAQQLPEP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 6-1. pep LASQFAAiyiNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
I I I I I I : I M I I I I I : I M I I I I I I i I t I : M I I I t I t I I I I I I I It t I I M I I 1 t I i I 
orf76ng LASQFAGMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 



orf76-l .pep 
orf76ng 



250 

IDALLEENGVKPX 
1 II I I i I I I I I II 
IDALLEENGVKPX 
250 



Furthermore, ORF76ng shows significant homology to a B,subtilis export protein precursor: 
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sp|P243271PRSA_BACS0 PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi | 98227 | pir | I S15269 
33K lipoprotein - Bacillus subtilis >gi|39782 (X57271) 33kDa lipoprotein 
[Bacillus subtilis] 

>gi|2226124 |gnl|PID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
>gi|2633331|gnl|PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 
Length - 292 
Score = 50.4 bits (118), Expect = le-05 

Identities = 48/199 (24%), Positives - 82/199 (41%), Gaps = 32/199 (16%) 



Query: 


70 


VLKNRALKEGLDK DKDVQNRFKIAEASF YAEEYVRFLERSETVSE 


114 






VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 




Sbjct: 


53 


VLTQLVQEKVLDKKYKVSDKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 


112 


Query : 


115 


SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 


163 




A +++++E 1+ + A ++ A + ++ L KG FE L K Y 




Sbjct: 


113 


KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 


172 


Query: 


164 


DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 


218 






DAG F Q+E+ + G+V+ DPVK Y++ K +E D 




Sbjct: 


173 


DSSASKGGDLGWFAKEGQMDETFSKAAFKLBCTGEVS-DPVKTQYGYHIIKKTEERGKYDD 


231 


Query : 


219 


QPFELVRNQLEQGLRQEKA 237 








EL LEQ L A 




Sbjct: 


232 


MKKELKSEVLEQKLNDNAA 250 





Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
the gonococcal protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae^ 
and their epitopes, could be usefiil antigens for vaccines or diagnostics, or for raising antibodies. 



ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E,colU as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure lOA shows 
the results of affinity purification of the His-fiision protein. Purified His-fiision protein was used 
to immunise mice, whose sera were used for Western blot (Figure lOB), ELISA (positive result), 
and FACS analysis (Figixre IOC). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a usefiil immimogen. 



Example 36 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 303>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTTACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCT^AAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

1401 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

1451 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence <SEQ ID 304; 0RF81>: 



1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 



wo 99/24578 



-205- 



PCT/IB98/01665 



51 LFARNKVTRL LIAVFFAFSX lANNVHYADY QSWMT 

// 

401 ...QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
451 LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 



1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 

101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAATTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

701 GACGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 

801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAAGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MKKSFLTLVL 
LFARYKVTRL 
SAGASMLDKL 
VRSFDTKQEH 
QPAPSKIGQG 
PIVKQSYSAG 
TYFYSAQAEN 
KINLQQGKHF 
QMIQTVFEQL 
LYSPDKAVQQ 
GNLITGDAGS 



YSSLLTASEI 
LIAVFFAFSI 
WLPVLWGVLE 
GISPKPTYSR 
SVQNIVLIMG 
FMTAVSLPSF 
EMAILNLIGK 
IVLHQRGSHA 
QKQPDGNWLF 
AANQAFAPCE 
LNIRDGKAEY 



AYRFVFGIET 
lANNVHYAVY 
VMLFCSLAKF 
IKANYFSFGY 
ESESAAHLKL 
FNAIPHANGL 
KWIDHLIQPT 
PYGALLQPQD 
AYTSDHGQYV 
lAFHQQLSTF 
VYPQ* 



LPAAKIAETF 
QSWMTGINYW 
RRKTHFSADI 
FVGRVLPYQL 
FGYGRETSPF 
EQISGGDTNM 
QLGYGNGDNM 
KVFGEADIVD 
RQDIYNQGTV 
LIHTLGYDMP 



ALTFVIAALY 
LMLKEVTEVG 
LFAFLMLMIF 
FDLSRIPAFK 
LTRLSQADFK 
FRLAKEQGYE 
PDEKLLPLFD 
KYDNTIHKTD 
QPDSYLVPLV 
VSGCREGSVT 



Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeninsitidis (strain A) 

0RF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORFSla) from strain A of K meningitidis: 

10 20 30 40 50 60 

orf 81 . pep MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAA KIAETFALTFVIAALYLFA RNtCVTRL 
t I I I : : : i I M I I I I I I I I t I : : I I I I I i t I i : ! I I I I I t I I I I I I I I i I I I : I 1 I 
orf 8 la MKKSLFVLFLYSSLLTAS EIAYRFVFGIETLPAA KMAETFALTFVIAALYLF ARYKATRL 

10 20 30 40 50 60 



70 



80 
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orf 81 . pep L I AVFFAFS I I ANN VH YAP YQSWMT 
I I I i I I i I t I I M I M I I I I i I : I 
orf 81a LIAVFFAFSIIANNVH YAVYQSWITGINYWLMLKEITEVGGAGASMLDKLW LPALWGVLE 
70 80 90 100 110 120 

// 

120 130 140 

orf 81 .pep QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 

I I t I I I I I I I 1 I I I I I I I I i I I I I I I I I i 
orf 81a IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 
280 290 300 310 320 330 



150 160 170 180 190 200 

orf 81 . pep lYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
t i I M t I M I [ I I I I I I I I i I t t i I M i [ I I I I I I M I I I I M I I I I I M I t I i I I t I I I 
orf 81a lYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
340 350 360 370 380 390 

210 220 230 

orf 81 .pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
IIIIMIIIIItlllllltlltltlMIIIII 
orf 8 la CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 

The complete length 0RF8 la nucleotide sequence <SEQ ID 307> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



ATGAAAAAAT 
CAGCGAAATT 
CAAAAATGGC 
CTGTTTGCGC 
GTTCAGCATT 
TAACGGGCAT 
GGCGCAGGGG 
CGTGTTGGAA 
CGCATTTTTC 
GTGCGTTCGT 
ATACAGCCGC 
GCGTGTTGCC 
CAGCCTGCTC 
GATTATGGGC 
GGCGCGAAAC 
CCGATTGTGA 
GCCCAGTTTC 
GCGGCGGCGA 
CAAATGATTC 
CTGGCTGTTT 
TCTACAATCA 
TTGTACAGCC 
GCCTTGCGAG 
CGTTGGGCTA 
GGCAACCTGA 
GGCGGAATAT 



CCCTTTTCGT 
GCTTATCGCT 
AGAAACGTTT 
GTTATAAGGC 
ATTGCCAACA 
TAATTATTGG 
CGTCGATGTT 
GTCATGTTGT 
TGCCGATATA 
TCGACACGAA 
ATCAAAGCCA 
GTATCAGTTG 
CAAGCAGAAT 
GAAAGCGAAA 
TTCGCCGTTT 
AACAAAGTTA 
TTTAACGTCA 
TATTGTGGAT 
AAACCGTATT 
GCCTATACCT 
AGGCACGGTG 
CGGATAAGGC 
ATTGCCTTCC 
CGATATGCCG 
TTACGGGTGA 
GTTTATCCGC 



TCTCTTTCTG 
TTGTATTCGG 
GCGCTGACAT 
AACGCGTTTG 
ATGTGCATTA 
CTGATGCTGA 
GGATAAGTTG 
TTTGCAGCCT 
CTGTTTGCCT 
ACAAGAACAC 
ATTATTTCAG 
TTTGATTTAA 
CGGGCAAGGC 
GCGCGGCGCA 
TTGACCCAGC 
TTCCGCAGGC 
TACCGCATGC 
AAGTACGACA 
CGAGCAGCTG 
CCGATCATGG 
CAGCCCGACA 
CGTGCAACAG 
ATCAGCAGCT 
GTTTCAGGTT 
TGCAGGCAGC 
AATGA 



TATTCGTCCC 
AATTGAAACC 
TTGTGATTGC 
TTGATTGCGG 
CGCGGTTTAT 
AAGAGATTAC 
TGGCTGCCTG 
TGCCAAGTTC 
TCCTAATGCT 
GGTATTTCGC 
CTTCGGTTAT 
GCAAGATTCC 
AGTATTCAAA 
TTTGAAATTG 
TTTCGCAAGC 
TTTATGACGG 
CAACGGCTTG 
ACACCATCCA 
CAAAAGCAGC 
CCAGTATGTT 
GCTATCTCGT 
GCTGCCAACC 
TTCAACGTTC 
GTCGCGAAGG 
TTGAACATTC 



TACTTACTGC 
TTACCGGCTG 
TGCGCTGTAT 
TGTTTTTCGC 
CAAAGCTGGA 
CGAAGTTGGC 
CGTTGTGGGG 
CGCCGTAAGA 
GATGATTTTC 
CCAAACCGAC 
TTTGTCGGAC 
TGTGTTCAAA 
ATATCGTCCT 
TTTGGCTACG 
CGATTTTAAG 
CAGTATCCCT 
GAACAAATCA 
CAAAACCGAC 
CTGACGGCAA 
CGCCAAGATA 
GCCGCTGGTG 
AGGCTTTTGC 
CTGATTCACA 
CTCGGTAACG 
GCGACGGCAA 



This encodes a protein having amino acid sequence <SEQ ID 308>: 



1 MKKSLFVLFL YSSLLTAS EI AYRFVFGIET LPAAK MAETF ALTFVIAALY 

51 LFARYKAT RL LI AVFFAFS I lANNVH YAVY QSWITGINYW LMLKEITEVG 

101 GAGASMLDKL W LPALWGVLE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNXVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

401 GNLITGDAGS LNIRDGKAEY VYPQ* 

ORFSla and 0RF81-1 show 77.9% identity in 524 aa overlap: 



10 20 30 40 50 60 

orf 81a . pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 

I I t t M I I I I I I If M i I 1 I I I t i t I : M M I I I I I I I I M I I I I I I : I I I 
orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 



wo 99/24578 



-207- 



PCT/IB98/01665 



70 80 90 100 110 120 

orfSla.pep LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 
I I t I t I I I I I M [ I I I I I i n i I : I I I I I 1 I I I i l: I I I i : M I i t I I I I I I I : I I I I I I 
orf81-l LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 81a . pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
I I I I I I I I I I I M I I I I I I M I I I n I I I I M t I I I I I I I I I I I I I I I I M I I I I I I t t I 
orf81-l VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 81a . pep FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 
I I M i t 1 I ) I I I I I : t I : I I I M I I : M I I I : I I I M I I i i I I I I I I I M I M I I t I I I I 
orf 81-1 FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 



250 260 270 280 

orf 81a . pep LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

I I : I t I M I I 1! I I I I i n I M I I I I i I I 1 t I : I I I I I I I I I t I I I I 
orf 8 1-1 LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



orf 81a. pep 



orf 81-1 TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 



290 300 310 320 

orf 81a. pep IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

I I I M I I M [ I I t I I t I I t i I t I t t I I I M I t ) 
orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
370 380 390 400 410 420 



330 340 350 360 370 380 

orf 81a . pep AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
I M I 1 I I I I t I M I I I I t I I I i I t I I I I i I I I I t I I I M I I M I I M M I i M I I I I I I I 
orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

390 400 410 420 

orf 81a . pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
i I t t I I I I I I I I I I I I I I I I t I I I I M I I I I i I I I I M I I I I I t I 
or f 8 1 - 1 LIHTLG YDMPVSGCREGS VTGNLITGDAGS LN IRDGKAE YVYPQX 

490 500 510 520 

Homology with a predicted ORF &om Kgonorrhoeae 

The aligned aa sequences of 0RF8 1 and a predicted ORF (0RF8 1 .ng) fix)m N. gonorrhoeae of the 
N- and C-tennini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 



orf 81. pep 
orfSlng 
orf 81 .pep 
orfSlng 
orf 81 .pep 
orfSlng 
orf 81 .pep 
orfSlng 



MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 

iil|:::| IIMIIIIIIIII : : I I I I I I I t I : I I M I M I : I I I t I M I i 
MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 

LI AVFFAFS 1 1 ANNVHYAD YQSWMT 
I I I I I 1 i [ I : I I I I I I t t I I I i t I 

LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 

// 

QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 
I n I t I I I t t I I M I I I M I I I t I I I M I 
ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 



60 



60 



85 



120 



433 



433 



493 



lYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
t I I I I I I [ tl i I : I i M M t I I i I I i I I I I I I I I M I t I I I I [ I I t i I I I I I I I It i M I 
lYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 
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orfSl.pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

1 I t I I M I I 1 I i M M I I t I t: I I I I M I I I 
orfBlng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 

The complete length ORFSlng nucleotide sequence <SEQ ED 309> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 

51 CAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGCG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CTTTGTGGGG 

351 CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 

701 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACA 

1501 GGCAACCTGA TTACGGGCGA TGCAGGCAGC TTGAACATTC GCAACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATAA 

This encodes a protein having amino acid sequence <SEQ ID 310>: 

1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

51 LFARYKASRL LIAVFFAFSM lANNVH YAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL W LPALWGV/^ VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

451 LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 

ORFSlng and ORF81-1 show 96.4% identity in 524 aa overlap: 

10 20 30 40 50 60 

MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
I I I I : : : I I M I I I I I I I I I I I I t I I I i 1 i I t I I : I i I I I M I : i I I M I I I I I I : : I I 
MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 
10 20 30 40 50 60 

70 80 90 100 110 120 

LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 
IIMIII!l:lltllililiilMlillltllllMtilltlltilIIIIIII:|ll] I 
LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 
70 80 90 100 110 120 

130 140 150 160 170 180 

VMLFCSLAKFRRKTHFSADILFAFLMLMI FVRSFDTKQEHGI SPKPTYSRIKAN YFS FGY 
I I i It t i I I t I I I I I i M I I I I i I i I I I M I I I I I I I I I I I I t t I I i I I i t I I I I i I M I 
VMLFCSLAKFRRKTHFSADILFAFLMLMI FVRS FDTKQEHGIS PKPTYSRIKANYFS FGY 



orfSlng-l.pep 
orf81-l 

orf81ng-l .pep 
orf81-l 

orf 81ng-l .pep 
orf81-l 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



30 



35 



40 



190 200 210 220 230 240 

orfSlng-l.pep FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 
tilMlilMtliClhllilMlllltlhllllMilillllttllltlllMMil 

orf81-l EVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfSlng-l.pep LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGDTNMFRLAKEQGYE 
iltllltllllllllllMIIMIIIIlllM:lllll]llltllltllliilltllill 
orfBl-l LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 81ng-l . pep TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 
)||Millll:llllllllltllllllIlllllllilllllltllltllll[ll[ii:ll 
orf 81-1 TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orfSlng-l.pep IVLHQRGSHAPYG/U.LQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
I I 1) I t I i I I M I I I I I i i I I i M t i I M I I I t t M I I t I M I t I i I I I I i I i i I I [ I I I 
orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 

430 440 450 460 470 480 

orfSlng-l.pep AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVCX3AANQAFAPCEIAFHQQLSTF 
ttlllllllMlllllinilltlCIMIIIIMItlllMlllilMllllllillll 
or f 8 1 -1 AYTSDHGQYVRQDI YNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

490 500 510 520 

orf 81ng-l . pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 
I I I I I i I i M I I I M t t I I I I t I I I t i I I I 1 I I I : I I I I I i I t I I 
or f 8 1 - 1 LIHTLGYDMPVSGCREGS VTGNLITGDAGSLN I RDGKAE YVYPQX 

490 500 510 520 

Furthermore, ORFSlng shows significant homology to an Kcoli OMP: 

outer membrane adherence protein-associated protein [E. 



gi I 1256380 (050906) 
coli} Length - 547 

Score ='87.4 bits (213), Expect - 2e-16 

Identities = 122/468 (26%), Positives = 198/468 (42%), 



Gaps = 70/468 (14%) 



45 



50 



55 



60 



65 



70 



Query: 


25 


Sbjct: 


29 


Query: 


82 


Sbjct: 


87 


Query: 


135 


Sbjct: 


142 


Query: 


184 


Sbjct: 


202 


Query: 


242 


Sbjct: 


258 


Query ; 


299 


Sbjct: 


311 


Query : 


356 


Sbjct: 


360 



VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS — RLLIAVFFAFSMIANNVHYAVYQ 81 
VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

VFGITNLVASSGAHMVQRLLFFVLTILWKRISSLPLRLLVAAPFVL-LTAADMSISLY- 86 

SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 

SWCTFGTTFNDGFAISVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVIIKYDV 141 



+Q 



+Q 



L+L++ 



+ +P F+ 



Q+ S 



TA+S+P 



-DTKQEHGISPKPTYSRIKAN—YFSFGYFVG 183 
D K ++ SP SR +F+ YF 



VLI+GES ++ L+GY R T+P + 



-f +V+ H I N+ +A + G 

LTADSVLSH DIHNYPDNIINMANQAG 310 



++T++ S+Q4- +N A+ ++ 



++ + Y G DE LLP + Q 
-AMRAMETVYVRGF DELLLPHLSQALQQ 359 



Q + IVLH GSH P + 



VF 



D D YDN+IH TD ++ VFE L+ 



418 
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Query: 413 QPDGNWLFAYTSDHG QYVRQDIYNQG—TVQPDSYIVPL-VLySP 454 

D Y +DHG ++++Y G +Y VP+ + YSP 

Sbjct: 419 — DRRASVMYFADHGLERDFTKKNVYFHGGREASQQAYHVPMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and Kgonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 31 1>: 

1 . . .ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACACGTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

401 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

451 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 

1 . .TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPA/^LT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTKLI QTVFYLRGIE WPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 



1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 MCTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 



1 MKTLLLLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG lEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
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301 DVGNEVIRRR KGG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF83 shows 96,4% identity over a 197aa overlap with an ORF (ORF83a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 

orf 8 3 . pep TLLLFIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 

III : I t I I I I until II I I I II M I I I I I I II I N It II t II t I I I I II II I 
orf 83a MKTLLXLI PLVLTA CGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 8 3 . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

I I I t 1 1 t 1 1 I i t I I I 1 1 1 1 1 1 II 1 It 1 1 1 1 I t II 1 I 1 1 I I I 1 1 1 1 1 1 I I I I 1 1 1 1 1 1 1 II 

orf 83a yVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 

120 130 140 150 160 170 

orf 83 . pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

1 1 1 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 II 1 1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 II I n 1 1 1 1 1 1 1 

orf 83a TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 

180 190 
orf 83 . pep lEWPPXYADTDVFVTVDV 
1 I t I t I 1 I I I t it t I t I I 
orf 83a IEWPPEYADTDVFVTVDVFGTVRSRT'ELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 

1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCA/VA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCT6ACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 



This encodes a protein having amino acid sequence <SEQ ID 3 16>: 



1 MKTLLXLI PL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG lEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 



10 20 30 40 50 60 

or f 8 3a . pep MKTLLXLI PLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
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I I I I I [ I t I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M i I M M I I I I I I I I I 
orf83-l MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 83a . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 M 1 1 1 1 1 1 1 1 1 1 n 1 1 [ 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M n 1 1 1 

orf 83-1 YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 83a . pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
M ) I M M I I I I t I t I I I I i I I I I I I 1 I M M I M I I I I 1 It M I I I I t I I I I I I I I I I I 
orf 83-1 TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 83a . pep lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
t I I I I I I I I i I I I I I I I t I [ t I I I I I I I i I I t t M I t I I I I t M I I i M I M I I M t : I I 
orf 83-1 lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 83a . pep TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I I I I I I I I I I I f i I I I I I : I : I I 1 I I I I I I I I i I I I i I I I I i t I I I I I I t I I M I I I I I 
orf 83-1 TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 

250 260 270 280 290 300 

310 

orf 83a . pep DVGNEVIRRRKGGX 
I n I I I I I I I I I I I 
or f 8 3 - 1 DVGNEVIRRRKGGX 
310 

Homology with a predicted ORF from N. gonorrhoeae 

ORF83 shov/s 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from 
gonorrhoeae: 



orf 8 3. pep 
orf 83ng 
orf 83. pep 
orf83ng 
orf 8 3. pep 
orf83ng 
orf 8 3. pep 
orf83ng 



TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
I I M : I M i t I until I M I M I I I I I I I I I I I I t I I I I I I f I I t I I I I I i I I 
MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
i n M I I I I I I I i I M I I I M I M I I I I I I I I : I i I: i I I I I I i I I I I I I I I It I: I I 1 I 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 



58 



60 



118 



120 



178 



TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
tIMtlltl ttlt:tlttlltlttttltltttttlltltttlltltlllt]llllttlt 
TSLLNAPAAALTKNNGRKGSRSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 180 

lEWPPXYADTDVFVTVDV 197 
tttttt Itltltltllli 

IE WPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 240 



The complete length ORF83ng nucleotide sequence <SEQ ID 317> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGAAAACCC 
ACTGACCGGC 
AGGAACTCGT 
TCCGCCCTGA 
CCAAGGTTCG 
TACGCGGCGG 
CCCGCCTATG 
AACCACTTCC 
ACAACGGACG 
GGCGACTACC 
CCTGACCAAC 
TACCGCCCGA 
GGCACCGTCC 



TGCTCCTCCT 
ATACCCGCCC 
CGCCGCATCG 
AAGGACGCAA 
GGCAACATAA 
CTACCACAAC 
ACACTACCGC 
ACATCGCTTT 
CAAAGGCGAA 
GCAACGAT^AC 
CTCATCCAAA 
ATACGCCGAC 
GCAGCCGTAC 



CATCCCCCTC 
ACGGCGGCGG 
TCCCGCGCCG 
AGCCGCCCTT 
GCGGCGGACG 
AACCCCGACA 
CACCACCAAA 
TGAACGCCCC 
CGCTCCGCCG 
CCTGCTCGCC 
CCGTCTTCTA 
ACCGACGTAT 
CGAACTGCAC 



GTACTCACCG 
CAAACGCTTT 
CCGTCAAAGA 
TACGTCTCCG 
CTACTCCATC 
GCGCCACCCG 
TCCGACGCGC 
CGCCGCCGCC 
GACTGTCCGT 
AACCCCCGCG 
CCTGCGCGGC 
TCGTAACCGT 
CTCTACAACG 



CCTGCGGCAC 
GCCGTCGAAC 
AATGGACTTG 
TTATGGGCGA 
GACGCACTGA 
ATACAGCTAC 
TCTCCGGCGT 
CTGACGAAAA 
CAACGGCACG 
ACGTTTCCTT 
ATCGAAGTCG 
CGACGTATTC 
CCGAAACCCT 
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651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 318>: 

1 MKTL LLLIPL VLTAC GTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG lEWPPEYAD TDVEVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 OYALWM GPYS VGKTV KASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

301 DVGNEVIRRR KGG* 

ORF83ng and ORF83-1 show 97.1% identity in 313 aa overlap 



10 20 30 40 50 60 

or f 83-1 . pep MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
M I I I I I I I I I I i t I I I i I I I I I I I I I I I I I I I f I I I I I t I I 1 i I I I I M M I I I I I I I I 
orf83ng MKTLLL LI PL VLTACGTLTGIPAHGGGKRFAVEQELVAAS SRAAVKEMDL SALKGRKAAL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 83-1. pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I t I I [ I t I I I i I I M i I t 1 I I I I t I I I I I I i I : I I t : I i i I i I I I t i i I I I I ) I i : I I i I 
orf83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 83-1. pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
IIIMIIIIIIIII:MI[lll[lllllillllllllllMllltlllllllllMilll 
orf83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 83-1 . pep lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
t I I I I I I I I M I I It I I I i I i I I I t M i 1 I I i 1 1 M i I I I M i I I I I ) I i i I I M I I : I I 
orf83ng lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 83-1 . pep TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I I t M 1 I I I I I I I I I t I t : [ : I i I M I I I I I I I I I t t I I I [ i I I i I I I I I I I I I I M: I 
orf83ng TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 

250 260 270 280 290 300 

3X0 

orf 83-1. pep DVGNEVIRRRKGGX 
I t I i I M I I t I I I I 
orf83ng DVGNEVIRRRKGGX 
310 

Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-imderlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeaey and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
319>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATATW^GAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

701 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 

951 gaAAGAAGTG ACGGaGTTGA TGTGccaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCGcCAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQN LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 321>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 

1101 GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 



This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 
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1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIVLLIPVFV GL SYKMLSSY GBCKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKP*QN LMYDNWEERG KPFEGIGGGV VGSAN* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 84 . pep I^ICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 
I I I I t I I I I I I M I I I I I I M I I I I I I I I It I t :: i I I I I I I I I I It I I I I I I I I I t M I 
orf 84a MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 84 . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
1 I 1 I I i 1 I 1 I 1 I I 1 1 1 i I 1 II t 1 I t t t I 1 I I 1 I t 1 I 1 1 1 1 t 1 i I I I 1 t 1 I t I 1 t I 11 II i 

orf 84a lpkstdeqlsahdmyewikkpenigsivivdeaqdvwparsagskipenvqwlnthrhqg 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 84 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
llllllltl I I I I 1 I II i I I I I I I I I I II M II II 1 I II I I I I I I II 1 I I I I I II I I I I 
orf 84a IDIEVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 84 . pep LDKKVYDLYXXAEVHTVNKVKRSK WFYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 
I t t I I I I II I I I II I I I I t I I I I t I I II I I : t I II I I I I I II I t I I I I I I I I I I I t I t 
orf 8 4a LDKKVYDLYESAEVHTVNKVKRSKW FYTLPVIILLIPVBVGL SYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 84 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
Iltlll:llt: I I I I i I I I I II I I I I I I I I I I I I I I I M I I I I I t I I I I M I II I I I: 
orf 84 a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 8 4 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
I M I I I I : 11 I I I I t II t I : I : I I I I I : : t M I t I I I I I I I I I : : 11 t I I : t I I I II 
orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

310 320 330 340 350 360 



370 380 390 

orf 84 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

lllltll lllllllhtlltllllltllllllllt 

orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
370 380 390 

The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 



1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGTVATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
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501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



CGTAAAAATG 
AAGTTTATGA 
AAGCGGTCAA 
CGTTTTTGTC 
AGGAAGAACC 
TTTCAGGATA 
AGATATGTTT 
ATAACGGTGT 
GAAGGCGGAA 
GAAAGAAATT 
CGTTTAACCC 
GAGCAGCACC 
GTGGCAAAAT 
AAGGAATCGG 



GCATCAAGCG 
CTTGTACGAA 
AATGGTTTTA 
GGCCTGTCCT 
CGCAGCACAA 
AAACAGAAGG 
GTTCCGACAT 
AAGGCAGGTA 
GAACCGGATG 
ACAAAGGAAA 
ATATAAAGAA 
ATTCGGACAG 
CTTATGTATG 
CGGGGGCGTG 



CATTCTCCAG 
TCAGCGGAAG 
TACTCTGCCA 
ATAAAATGTT 
GAATCGGCGG 
CGAGCCGGTA 
TGTCCGAAAA 
AGAACCTTTG 
CACATGCTAT 
TGTGCAAGGA 
GAAAGCCAAG 
ACCGCAAGTT 
ATAATTGGCA 
GTCGGATCGG 



TATCTATACA 
TTCATACCGT 
GTAATAATAT 
AAGTAGTTAT 
CAACAGAACA 
AACAACGGTA 
ACCCGAAAGC 
AATATATAGC 
TCGCATCAAG 
TTACGCAAGA 
GGCGGGATGT 
GCCACGTTGG 
GGAGCGCGGA 
CAAACTGA 



CTGGATAAAA 
AAATAAGGTC 
TGCTGATTCC 
GGAAAAAAAC 
TCAGGCAGTA 
ACCTTACCGC 
AAGCCGATTT 
AGGCTGTGTA 
GGACGGCATT 
AACGGATTGC 
CCAGCAAAGT 
GCGGAAAGCC 
AAACCGTTTG 



This encodes a protein having amino acid sequence <SEQ JD 324>; 



1 

51 
101 
151 
201 
251 
301 
351 



MAEICLITGT 
HTYIETDAKK 
SAGSKIPENV 
KMGMRTLLEW 
KRSKWFYTLP 



PGSGKTLKMV 
LPKSTDEQLS 
QWLNTHRHQG 
KICADDPVKM 
VIILLIPVFV 



FQDKTEGEPV 
EGGRTGCTCY 
EQHHSDRPQV 



NNGNLTADMF 
SHQGTALKEI 
ATLGGKPWQN 



SMMANDEMFK 
AHDMYEWIKK 
IDIEVLTQGS 
ASSAFSSIYT 
GLSYKMLSSY 
VPTLSEKPES 
TKEMCKDYAR 
LMYDNWQERG 



PDENGIRRKV 
PENIGSIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VRKHYHIASN 
SAEVHTVNKV 
ESAATEHQAV 
RTFEYIAGCV 
ESQGRDVQQS 
VGSAN* 



ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 



orf84a.pep 
orf84-l 



orf84a.pep 
orf84-l 



orf84a.pep 



orf84-l 



or f 8 4a. pep 



orf84-l 



orf 84a.pep 
orf84-l 



orf 84a. pep 
orf84-l 



orf 84a. pep 



orf84-l 



10 20 30 40 50 60 

MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
IIMinilltllMllltllllllinilMllltllllllllltlMltllltMtlt 
MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLS AHDMYEWIKKPENIGS I VI VDEAQDVWPARS AGSKI PENVQWLNTHRHQG 
I I I I I I t I [ t I t I I I t I i I I i 1 I M I I I M it M M t i I I I I I I I I I I It I t I t I t I I I I 
LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

I DI EVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADD PVKMAS SAFSS I YT 
IMIillit IMIttllltllttlttllttltlltltlltillllttlttlttllltll 
IDIEVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIILLIPVEVGLSYKMLSSYGKKQEEPAAQ 
tMilttltttlttllltlltltllltltttt:iillltliltltlttlllltllltlll 
LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEHQAVFQDKTEGEPVNNGNLTADMEVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

|l||[t:lt|: IMIIttltlttltlttlttlllttllttltltltlttllttlllll: 
ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

l||tll|:||litlttttt:|: I I t I t : : I I I 11 t I II It I t I : : t t t t |:||ll 11 
EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 

370 380 390 

ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
llttitt II i I It I t: II 1 I I I I M I t 1 It I M It 
ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
370 380 390 
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Homology with a predicted ORF from N.£onorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 



gonorrhoeae: 

orf84 .pep 
orf84ng 
orf84 .pep 
orf84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84. pep 
orf 84ng 



MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 

||||tlllllllllMliittlllillllllll:::niiMIMIIMIII:ltllltl 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 



IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
I I I t I I I I I I i I I I I M I I f I I i I i I : I i I I: I I I I M I : t I I I I I t I I I I I I I I I t I 
IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 



60 



60 



120 



LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
||lttllllllltlllMllill:l:lllillllMMItll!IIIIIIM[llllltll 
LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 



180 



180 



LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVEVGLSYKMLSSYGKKQEEPAAQ 240 
IIIIMII) l|:flllllllllMI:llll:llil:lltlMtll:IIIIMIIMM 

LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLEVGLSYKMLGSYGKKQEEPAAQ 240 

ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 300 
[Itiiillillillllll I I I I i M I i I It I 1 1 in I I I I II t I I I II II i I I I I M 

ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 300 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

[ I t II I I: I I 11 I II I I I II I t I I I i I II I I M I I II II t I I I I I I I I I I I I I II I II II 

EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 



orf 84 . pep ATLGGKPXQNLMYDKFWEERGKPFEGIGGGWGSAN 395 

II I I I 1 I II I I I I I M I ) II I I I I I I I II t It I I 
or f 8 4ng ATLGGKPQQNLMYDNWEERGKPFEG IGGG WG SAN 395 

The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 



1 ATGGCAGAAA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCAGATGAAA 

101 ACGGCGTACG CCGTAAAGTA TTTACG7WVCA TCAAAGGTTT GAAGATACCG 

151 CACACCCACA TAGAAACAGA CGCT^GAAG CTGCCGAAAT CAACCGATGA 

201 ACAGCTTTCG GCGCATGATA TGTATGAATG GATCAAGAAG CCTGAAAacg 

251 tcggcgCAAT CGTTATTGTC GATGAGGCGC AAGACGTATG GCCCGCACGC 

301 TccgCAGGTT CGAAAATCCC CGAAAACGTC CAATGGCTGA ACACACACAG 

351 GCATCAGGGC ATAGATATAT TTGTATTGAC ACAAGGTCCT AAACTCTTAG 

401 ATCAGAACTT GCGAACATTG GTTAAAAGAC ATTACCACAT TGCGGCCAAC 

451 AAAATGGGTT TGCGTACCCT GCTTGAATGG AAAGTATGCG CGGATGACCC 

501 GGTAAAAATG GCATCAAGTG CATTTTCCAG TATCTACACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCCGCAGAAA TTCACACGGT AAACAAAGTC 

601 AAGCGTTCAA AATGGTTTTA TGCATTGCCC GTCATCATAT TATTGATTCC 

651 GCTATTTGTC GGTTTGTCTT ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG AGAATCGGTG AATAACGGAA ACCTTACGGC 

801 AGATATGTTT GTTCCGACAT TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGGACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CACCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACCTTGG GCGGAAT^CC 

1101 GCAGCAGAAC CTAATGTACG ACAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 326>: 



1 MAEICLIT GT PGSGKT LKMV SMMANDEMFK PDENGVRRKV FTNIKGLKIP 

51 HTHIETDAKK LPKSTDEQLS AHDMYEWIKK PENVGAIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 

151 KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE SAEIHTVNKV 

201 KRSKW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 
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ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 

10 20 30 40 50 60 

orf84-l pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
|||MMIillIlllilllllll[IIIMIItlll:lllll[IIIIIIMII:tllllll 
5 orf84ng MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 84-1 . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
||||ltllliliMI[ttlllll:|:itlMllllllllltM)lltltlllltit)tll 
orf84ng LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 84-1 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHXASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
I I I [ I I I I I I I I f I M I I ) I I: : I I I I I : t t I I : t I I I I I I : I I I I i I I M I I I I M t I I 
orf84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 

130 140 150 160 170 180 



20 



25 



30 



190 200 210 220 230 240 

orf 84-1 . pep LDKICVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
I I i i M I I I t I It: t I M I I I i I M I I : I I I t : I I I I : I I I I t I I t I : I I I I I I I I I I I I 
orf84ng LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 84-1 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
I i I I 1 t i I I I I I t I I I I t I I I I i I I 1 I I I I I I I t I I I I I I I t I I M i I I I i I I I I I t I 
orf84ng ESAATEQQAVLPDKTEGESVNNGNLTADMEVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 



35 



310 320 330 340 350 360 

orf 84-1 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
I I t I I I I : M t I I I I I I I t M I I M I I I I I I I I i M I I I I I t I t I I M t I I I I I M I i I I 
orf84ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFKPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 



370 380 390 

ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
I I I I I M I M I I t I I 1 I I I t I I I M I I I I I I t I I I 
ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSANX 
370 380 390 

Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
45 double-underlined), it is predicted that the proteins from Kmeningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



orf 84-1 . pep 

40 

orf 84ng 



Example 39 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 327>: 

1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 

50 51 TGAAGTCAAA CTGAAAAAAT TCCATATCGA TTTTTACAAT ACGGGTATGC 

101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 

151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTTCGGAT TTGACATTCA 

251 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 

55 301 ACATCCATAC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

351 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

401 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

451 ACTCAGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA ATATAAAAAC TATATGCTGC 

60 551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC. 
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601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



TTGCAGCAGC 
AGCGGACACC 
GCAAACGTCT 
GAACAATTCA 
AGGCTATTTG 
AGCAGGATAA 
AACGCTGCTT 
GCAGGATGAA 
CGGGTTTGAC 
TCCGAGGTGC 
TTTGGTCTAT 



AATACCGCTG 
TTTATGGCAT 
.GTTGCCGAC 
TGCTGGCTGC 
GGATTGGACG 
GATGCAGGGC 
TGGATGAAAC 
GCGCGGAATC 
CGAATATCCC 
GTTCGTCGGG 
CTC. . . 
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GCTGCGTATC 
TGCGTGAGTT 
GCAACCAAAG 
GGAAAACACG 
AATTTATTAC 
TATTTCTACG 
CAT.ACCCGG 
GTTTCCTGCT 
GCGCCTATGC 
TTTGCAGATG 



CCCTTGGACA 
TTTGAAAGAT 
GCGCACCTGC 
CTGAACATCT 
GTCCAATATC 
AAATGCTTTA 
TACGGCTTGC 
GCACAGTATG 
TGCTGCAACT 
ACCCGTTCCC 



AGCAGTTGAA 
GGGGAAGGGC 
CGAAATCCGC 
TTGCACAAAA 
CCGAAAGAGC 
CGGCGTGATG 
CCGAATGGCA 
GATGCGTACA 
TGATGGGTTT 
C.GGTCCGCT 



This corresponds to the amino acid sequence <SEQ E) 328; ORF88>: 



1 MVFLNADNGI 

51 KLERTIRVNH 

101 TSIHQFPLEI 

151 TQEGHKYTNX 

201 LQQQYRWLRI 

251 EQFMLAAENT 

301 NAALDETXTR 

351 SEVRSSGLQM 



LVQDLPFEVK 
PLTLHGITIY 
GKHKYRLEFD 
XXXXXYRIRD 
PLDKQLKADT 
LNIFAQKGYL 
YGLPEWQQDE 
TRSXGPLLVY 



LKKFHIDFYN 
QASFADGGSD 
QFTSMNVEDM 
APGQAVEYKN 
FMALREFLKD 
GLDEFITSNI 
ARNRFLLHSM 
L. . . 



TGMPRDFASD 
LTFKAVfNLGD 
SEGAEREKSL 
YMLPVLQEQD 
GEGRKRXVAD 
PKEQQDKMQG 
DAYTGLTEYP 



lEVTDKATGE 
ASREPWLKA 
KSTLPDVRAV 
YFWITGTRSX 
ATKGAPAEIR 
YFYEMLYGVM 
APMLLQLDGF 



Further work revealed the complete nucleotide sequence <SEQ ID 329>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



ATGAGTAAAT 
TTTTTTCAGC 
TTGCATCGGT 
TATTTGGTCA 
ACTGTATGAC 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AGAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGC 
TCCGCGCCGT 
ATTGTTTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCACAAA 
CCCGAAAGAG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCGTAC 
TTGATGGGTT 
CCGGGTGCGC 
GGTATTGATG 
ACGGCAAAAT 
CAGAAGGAAT 
CTTGAATCAT 



CCCGTAGATC 
TCCATGCGCT 
TATCGGTACG 
AATTCGGATC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTTCG 
TCTTCGCTGT 
GGAAGTACAA 
TTCTGATTGC 
GCCCATGTTG 
CCTGCTGTTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGTGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACATCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GTATCCGTGA 
CCGGTTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGTTTGA 
TTCCGAGGTG 
TTTTGGTCTA 
TTTTATGTGC 
CCGTTTTGCC 
TTCCAAAACA 
GACTGA 



TCCCCCACTT 
TTGCAGTCGC 
GTGTTGCAGC 
GTTTTGGGCG 
CGGCATGGTT 
TGCCTGATTC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CTTTGATTGT 
AAACTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TCGCCAGCGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGGGTGA 
CACCAGTTTC 
TCAGTTCACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGCGGCAGGG 
AGGAACAGGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAT 
CCGAATATCC 
CGTTCGTCGG 
TCTCGGCTCG 
GCGAAAAACG 
ATGTCTTCGG 
CGTCGAGAGT 



CTTTCCCGTC 
TTTGCTCAGT 
AAAACCAGCC 
CAGATTTTTG 
TGTCGTTATC 
GCAATGTGCC 
AAAGAAAAAT 
AATTGCGCCC 
GAAAAACCAT 
GGCACAATGA 
CATTTGCCTG 
TGCTGACCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTG 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACG 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
GTGCTGTTGG 
GGCGTGGGTA 
CCCGCAGCGA 
CTGCAACGGC 



CGTGGTTCGC 
CTGCTGGGTA 
GCAGACGGAT 
GTTTTCTGGG 
ATGATGTTTT 
GCCGTTCTGG 
CTCTGGCGGC 
GAGGTTGCCA 
TAACCGTGAA 
ACAAATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
ATTGGTTCAG 
ATTTTTACAA 
ACGGACAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAGCCTGTCG 
TGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
ATTACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GGCGCACCTG 
GCTGAACATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTGCAAC 
GACCCGTTCC 
TATTGGGTAC 
TTGTTTTCAG 
ACGGGATTTG 
TCGGCAAGGA 



This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 



1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 
51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 
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101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



REMKSFREKV 
DGSVLIAAKK 
PDNQAVYAKD 
DLPFEVKLKK 
LHGITIYQAS 
KYRLEFDQFT 
IVYRIRDAAG 
KQLKADTFMA 
FAQKGYLGLD 
PEWQQDEARN 
PGALLVYLGS 



BCEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKE 
RFLLHSMDAY 
VLLVLGTVLM 



SSLLDVKIAP 
AHVALIVICL 



EVAKRYLEVQ 
GGLIDSNLLL 



QKEFPKHVES LQRLGKDLNH 



NLSFRGNVNI 
PRDFASDIEV 
KAWNLGDASR 
AEREKSLKST 
PVLQEQDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 
D* 



SEGQSADWF 
TDKATGEKLE 
EPWLKATSI 
LNDVRAVTQE 
ITGTRSGLQQ 
GAPAEIREQF 
EMLYGVMNAA 
LLQLDGFSEV 
LFSDGKIRFA 



GFQGKTINRE 
KLGMLTGRIV 
LNADNGILVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRIPLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
MSSARSERDL 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menin^tidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A of N, 



meningitidis: 



orfBB.pep 
orf88a 

orf88.pep 
orf88a 

orf 88 .pep 
orf88a 

orf 88 .pep 
orf88a 

orf 88 .pep 
orfSSa 

orf 88 .pep 
orf88a 

orf 88 .pep 
orf88a 

orf88a 



10 20 30 

MVFLNADNGILVQDLPFEVKLKKFHIDFYN 

: I I I I [ I I t I It i I I t I I I I I I I I I I I I I I 
AKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGILVQDLPFEVKLKKFHIDFYN 
210 220 230 240 250 260 

40 50 60 70 80 90 

TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

MMnililltltllllltllMllitllDtltlillMtlltillllllilllltll 
TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 
270 280 290 300 310 320 

100 110 120 130 140 150 

ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 
i I I I I I I I I I I I I I I t I I I I I M I t t I I I i I I M I I I t t I I t I I ) I I I I t t I I t I I I I I 
ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 
330 340 350 360 370 380 

160 170 180 190 200 210 

TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 
lllliMM llllil I I I I I I I I 1 1 t I I M I I I I I M I I I I I tlllllllll 

TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYFWITGTRSGLQQQYRWLRI 
390 400 410 420 430 440 

220 230 240 250 250 270 

PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 

MltlllltltllllillllllMII ttlllilMlltllliillMIMMIIIIIII 
PLDKQLKADTFMALREFLKDGEGRKRLVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 
450 460 470 480 490 500 

280 290 300 310 320 330 

GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 

llltllltlllMIIIIIMIIIIIIIillllllllt MMtllllMIIIIIMIII 
GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETIRRYGLPEWQQDEARNRFLLHSM 
510 520 530 540 550 560 

340 350 360 370 

DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 
I I I I i I I I i I I I I M I i M I I I I I I I I I I I I I I I I I I I I 

DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSPG ALLVYLGSVLLVLGTVLM FYVREKR 
570 580 590 600 610 620 

AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
630 640 650 660 670 



The complete length ORF88a nucleotide sequence <SEQ ID 331 > is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 
51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 
101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 
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151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

401 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGTGA TATTGAAGTA ACGGATAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTTACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG C7\ATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCA7VAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This encodes a protein having amino acid sequence <SEQ ID 332>: 

1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTP 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

451 KQLKADTF^4A LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLC^TRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 

orf 88a . pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

1 1 1 i 1 1 i I i 1 1 1 1 1 It 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 i 1 1 n I I I I I I I i I I I I t t I M I I I I 

orf 88-1 MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

orf 88a. pep QIFGFLGLYDVYASAWFVVIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

iiiiiiiiitiiiiniiiiiitiiiiiiMiiiMMiiiiiiiifniMiiiiiiii 

orf 88-1 QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

I I I I I I 1 I I I i I I I t I I I I t I I I i I I I I i M I M I I I I I I I I M I I I I I I I I I It I I I I t 
or f 8 8 - 1 SSLLDVKI APEVAKRYLEVQGFQGKT INREDGSVLIAAKKGTMNKWGY I FAHVALI VIOL 180 



orf88a.pep 



GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 24 0 
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orf88-l 

orfSSa.pep 

orf88-l 

orf 88a.pep 

orf88-l 

orfSSa.pep 

orf8S-l 

orfSSa.pep 

orfSS-l 

orfSSa.pep 

orfS8-l 

orfSSa.pep 

orfSS-l 

orf 88a .pep 

orf88-l 

orf 88a .pep 

orf88-l 



I I I I I I I I I I it II I I i I I I I I i I t I M I I I 1 I I I i I I I i i I I i I i t I I I I I I I I I I i I I 
GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

t II 1 I t I I I I I I II I II I I I M I I I I I II I I t n I I II I II t II I I I I I I I I II I I t I I I 
LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

II I I 1 I I I I I I I II I I I t I II I t I I I II I I I I I II II I I I II II t I I I I It I I I I I II I I 
LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

SMNVEDMSEGAEREKSLKSTliNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYBCNYML 420 

i I t t I I I I I f I I II I I I t II t I I t t I I I I I I I ) t I I M I I I [ I I I I II I I I I II I t I I t i 
SMNVEDMSEGAEREKSUCSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 
i I I i I I t i I t I I I I I I I I ) I I I I I I II I I I 1 I I I I I II I i i I I I I I I I I I I I I I I I I I I I 
PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 
I I t t I I I I I I I I I I II I I I I M I I I I I I I M I t t I I I I I I I I I I I I I I I I I I I I I I I I II 
GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 
I I 1 II I I I I I I II I I I I I I t I I I I M I I t I I I I I I I I I I I I II I t I I I I I I I I II I I It I 
LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

PGALLVYLGSVLLVLGTVLMFYVREBCRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

M I I I I I II I I i I I I I I I I I I I I I I It I i II I I I I I I I I I I II i I I I I I I I 1 I I I I I I II 
PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

LQRLGKDLNHD 672 
I I I I I t I I I I I 
LQRLGKDLNHD 672 



Homolopv with a predicted ORF from N.gonorrhoeae 

ORF88 shows 93.8% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from N. 
gonorrhoeae: 



orf 88 .pep 
orf88ng 
orf 88 .pep 
orfSSng 
orf 88 .pep 
orfSSng 
orf 88 -pep 
orfSSng 
orf 88 .pep 
orfSSng 
orf 88 . pep 
orfSSng 
orf 88 .pep 
orfSSng 



MVFLNADNGILVQDLPFEVKLBCKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 
I I I I I I I I I : 11 t I t I I I t t I I I I I I I I I I I I I I I I I II M I I I 1 I I I I I I I t i I I I I It 

MVFLNADNGMLVQDLPFEVKLKKFH I DFYNTGMPRDFAS DIE VT DKATGEKLERT IRVNH 6 0 

PLTLHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 
I I I I I I I I I 1 I I I I I I I I I I I I II I I II I I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I 

PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLBCATSIHQFPLEIGKHKYRLEFD 120 



QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 
I I I I t I I I I I M II t I I I 1 I 1 I I I 111111111:1111 I I I I I I I I I I I I I t 

QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 

YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 
tlll:tt::|ll|:lllll I It I I I I I I I t I I t I t I I I I I I I I I I I I t I I t I I I I III 
YMLPILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVAD 

ATKGAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVM 
III t I I t I t I I II I I I I I I I I I I t t I I I t I t I I I I I t I I I I I I I M I I I I I t I I I i I I 
ATKDAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVM 



180 



180 



240 



240 



300 



300 



360 



NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 
I I I I I I I I I II I t I t I I I M I I I I I t I I I I I I I I t I I I I I I i I I I I 1 I I I I I I I I M I 
NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 360 

TRSXGPLLVYL 371 
til I I I I I I 

TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 420 
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An ORF88ng nucleotide sequence <SEQ ID 333> was predicted to encode a protein having amino 
acid sequence <SEQ ID 334>: 

1 MVKLNADNGM LVQDLPFEVK LKKFHIDFYN TGMPRDFASD lEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVM 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 

401 RFAMSSARSE RDLQKEFPKH VESLQRLGKD LNHD* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 

1 ATGAGTAAAT CCCGTATATC TCCCACACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGCACG GTGTTACAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGACC GTTTTGGACT CGGATTTTTG ATTTTTTGGG 

201 TTTGTATGAT GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTC 

251 TGGTGGTTTC TACCAGTTTG TGTTTAATCC GTAACGTTCC GCCGTTTTGG 

301 CGCGAAATGA AGTCTTTCCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCCCCC GAAGTTGCCA 

401 AACGTTATCT GGAGGTGCGG GGTTTTCAGG GAAAAACCGT CAGCCGTGAG 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCAcaatga acaaATGGGG 

501 CTATATCTTT GCccaagtag ctTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGCTG AAGCTGGGTA TGCTGGCCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AAAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT GTTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGCGA TATTGAAGTA ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGAGGGA TGCTTCGCGC GAACCTGTCG 

1001 TGTTGAAGGC AACCTCCATA CACCAGTTTC CGTTGGAAAT CGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGT GCGGAACGGG AAAAAAGCCT GAAATCCACT CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATCGTGTACC GCATCCGTGA TGcggCAGGG CAGGCGGTCG TVATATAAAAA 

1251 CTATATGCTG CCGATTTTGC AGGACAAAGA TTATTTTTGG CTGACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GACGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAATATC 

1501 TTTGCGCAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGGG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAC CGTTTCCTGC TGCACAGTAT 

1701 GGATGCCTAT ACGGGGCTGA CGGAATATCC CGCGCCTATG CTGCTCCAGC 

1751 TTGACGGGTT TTCCGAGGTG CGTTCCTCAG GTTTGCAGAT GACCCGTTCG 

1801 CCGGGTGCGC TTTTGGTCTA TCtcggctcg gtattgttgg TTTTGGgtac 

1851 ggtaTttatg tTTTATGTGC GCGAAAAACG GGCGTGGgta tTGTTTTCag 

1901 aCGGCAAAAT CCGTTTTGCT ATGtCTTcgg CCcgcagcga ACGGGATTTG 

1951 cAGAaggaaT TTCCAAAACA CGtcgAGAGC CTGCAACggc tcggcaaggA 

2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVT^YLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADVVF LNADNGMLVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK DAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKG QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 
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551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PG ALLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 



orf 88-1 . pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

lltll II Itllillltllllilllllllltllllllllllltllllllllllll II: 

orf88ng-l MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orf 88-1 . pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

: II I I I I I I t II II I I I I I II II I I II II II t II I II I I It i I I I I I I I I I I I 1 t I I I t 

orf88ng-l RIFDFU3LYDVYASAWFWIMMFLVVSTSLCLIRNVPPFWREMKSFREECVKEKSLAAMRH 120 

orf 88-1 . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

I II t I I I I I t I I t t I t i M: I I I I II: : 11 t t I I I I 1 II I I I I I tl II I II : i I II II II 

orf88ng-l SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 180 

orf 88-1 . pep GGLIDSKLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

||||||||||iilll:lllllltlllllllllllllllllltllllllIlllliMMII 

orf88ng-l GGLIDSNLLLKLGMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

orf 88-1 . pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

||||||:llllllllllllllllllllllllllllllllllllllllltllllllllltl 

orf88ng-l LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf 88-1 .pep LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGECHKYRLEFDQFT 360 

I II t I I I I II I I I I I I I I I I I I I I I I I I I I i t I I I I I I I I I I I I I I I II I I I M M I I I 

orf88ng-l LHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

orf 88-1 .pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

ItltllllMlllllllllllltlllllllllllllllllllllllllllltllllllll 

orf88ng-l SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf88-l pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

I : I |: : I M I: I I { I I I I I I I I I I M I I I II I I I I I II I I II I I I II I I I I I I II II II i 

orf88ng-l PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf 88-1 . pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

II II I I I I I I I I It I I I I II i I I I I I I I I I I I I I I I I I 1 I I I I I I I I It I I I I I I [ I I 
orf88ng-l DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVMNAA 540 

orf 86-1 . pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

1) I I I ) II I I II I It I I II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I t I I I t I I I t I I II I 

orf88ng-l LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 88-1 . pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

I I [ II II I I II t I I I I I I : I i I I I I I M i I I I I I I 1 I I I I II I t i I I I t I I 1 I 1 I I I I I I 

or f 88ng- 1 PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 



orf 88-1. pep LQRLGKDLNHD 671 

1 I I II I I I I tl 
orf88ng-l LQRLGKDLNHD 671 

Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi 1 2984296 (AE000771) hypothetical protein (Aquifex aeolicusl Length = 537 
Score = 94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives = 159/334 (47%), Gaps - 59/334 (17%) 



Query: 16 FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 74 

+ F +S++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

Sbjct: 80 YDFLASLKLAIFIMLVLGILSMLGSTYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

Query: 7 5 AWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRHSSLLDVKIAPEVAK 134 

++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 
Sbjct: 140 WYYILFIVLLAVNLIFCSIKRLPRVWKQAFS-KERILKLDEHAEKHLKPITVKI-PDKDK 197 

Query: 135 — RYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICLGGLIDSNLLLKL 192 

++L +GF+ V E + + A+KG ++ G +AL+VI G LID 
Sbjct: 198 VLKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 24 9 
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Query: 193 GMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGMLVQDL 252 

+I+G RG++ ++EG + DV+ + A+ L 

Sbjct: 250 AIVGV RGSLIVAEGDTNDVMLVGAE— QKPYKL 280 

Query: 253 PFEVKLKKFHIDFY NTGMPRDFA SDIEVTDKATGEKLER — TIRVNHPLT 300 

PFVLFiy N+ + FA SDIE+ + G K+E T++VN P 
Sbjct: 281 PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKVNEPFD 337 

Query: 301 LHGITIYQASFA— DGGSDLTFKAWNLRDASREP 332 

++QA++ DG S + + + A +P 

Sbjct: 338 FGRYRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N.meningitidis and N.gonorrhoeaey and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
337>: 

1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 

1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG , 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 

1 MMSNKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with PilE of K gonorrhoeae (accession number Z69260). 
ORF89 and PilE protein show 30% aa identity in 120a overlap: 
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orf89 8 QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

QKGFTLI MIV+AI+GI++ +A+P+Y Y + S+ G + ++L-f + 

PilE 5 QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 

orf89 67 -DDNQTIENKLEIFVSGYKMNPKITVKKYSVSVKFVDKEKSRAYRLVGVPKAGTGYTLSVW 125 

DN + +G + KI KY SV + GV K G LS+W 

PilE 65 PKDNTS AGVASSDKIKGKYVQSVTVAKGWTAEMASTGVNKEIQGKKLSLW 115 

Homology with a predicted ORF from Kmeningitidis (strain A) 

ORF89 shows 83.3% identity over a 162aa overly with an ORF (ORF89a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf89.pep MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

I I I I I I I I I I I I I I II III llllllllllllllllt llllllll 

orf89a MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 89 . pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKEVDKEKSEIAYRLVGVPKAGTGY 
|||||!lllll|::||||||llllllllll:ll:III:ll::ll III lllllhllll 
orf 89a ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 8 9 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
I I I I I I I I 1 I I M I I i I I I I I : I I I I I I I I I I II t II II I I I I 
orf 8 9a TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

130 140 150 160 

The complete length ORF89a nucleotide sequence <SEQ ID 34 1> is: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGNGANGNT 

51 NATNGNCNTC GCGATACNCN GCNTTANCAG CGTCATTNCN ATNNNTNCNT 

101 ATCNNAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTNT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCAAGA GCAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC GAAAAATATA ATGTTTCGGT GCATTTTGTC 

301 AATGAGGAAA AACCNAGGGC ATACAGCTTG GTCGGCGTTC CAAAGACGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCGAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 342>: 

1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 

51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 

101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 

151 DVGCEAFSNR KK* 

ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 

10 20 30 40 50 60 

MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

I M It! I I I M I t 1 II 1)1 I I II I II I 1 I I t I i I I I I I I I I II i 
MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

10 20 30 40 50 60 

70 80 90 100 110 120 

ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

II I II I I I II M :: I I I I I I I I II i I II I I : I I : M I : I I I I III I I I I I I : t I It 
ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

70 80 90 100 110 120 

130 140 150 160 

TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

I I I I II I M t t i II I I II I I I : I I I I I I I I I I II I I I I I I i I I 
TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 



orf 8 9a. pep 
orf89-l 

orf 89a. pep 
orf89-l 

orf 8 9a. pep 
orf89-l 
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130 140 150 160 

Homology with a predicted ORF from N.£onorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 
5 gonorrhoeae: 

or f 8 9 MMSNXMXQKGFTLXXXMIWAI LGI ISVIAI PSYXS YIEKGYQSQLYTEMXGINNISKQF 60 

nil I MIIIM tlll:ltllllMlilll 1 I I I M i I t I I i t I I till: Ml 
orf89ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

10 orf89 IIJCNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 120 

Mill ||t:|:::l|:IIIMIIIIIIIIIIIIIII:lll II IIIMIIIMIMM 
orf89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

or f 8 9 TLSVWMNSVGDGYKCRDAASAQAHLETLSS DVGCEAFSNRKK 1 62 

15 I MM I Ml MUM 111:1111: :IM:| Mini MM 

orf89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 162 

The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 

20 101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

301 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

25 351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 

451 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 344>: 

1 MMSNKMEOKG FTLIEMMIW TILGIISVIA IPSYQSYIEK GYQSQLYTEM 
30 51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS , AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protem has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
35 identity in 162 aa overlap: 

10 20 30 40 50 60 

orf 89-1 . pep MMSNKMEQKGETLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
IIIIIIIIMIIIIIIIMMMMMIIIiniinillMMMMMIMtt: Ml 
orf89ng mmSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 
40 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9-1 pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
II II I M I : I : : : II : t II I II It M II II II M II : I II II I It I tl 1 I I : tl I M 
45 orf89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 8 9-1 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
50 n t II II t M 1 It II II I : II I I :: I I I : I II t I It I II I I 

orf89ng tLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRBCKX 

130 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protem, it was predicted that these proteins from N.meningitidis and Kgonorrhoeaey and their 
55 epitopes, could be useftil antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E,colU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure llA 
shows the results of affinity purification of the GST-fiision protein. Purified GST-fiision protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test, confirming that 
ORF89-1 is a surface-exposed protein, and that it is a usefiil immunogen. 



Example 41 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG.GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC. . . 

This corresponds to the amino acid sequence <SEQ ED 346; 0RF91>: 

1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 
51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP. . , 

Further work revealed the complete nucleotide sequence <SEQ ID 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT C/^GCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

1 MKKSSLISAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following resuhs: 
Homology with a predicted ORF from N.memngitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

orf 91.pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I M i : I I I It t I I I I I It I I i t I I I I I : t I I I I I I I t I I I I t : I i M M I t t I t 1 I I I I 
orf 91a MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 

10 20 30 40 50 60 

70 80 90 

orf 91 .pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 

I I t I t t I t II t t t t I t I t t I I t I I I t t t 
orf 91a YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 

70 80 90 100 110 120 
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orf91a KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

The complete length 0RF9 la nucleotide sequence <SEQ ID 349> is: 

5 1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

10 251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

15 501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKSSFISAL GIGILSIGMA FA APADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

20 101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 



ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 



10 20 30 40 50 60 

orf 91a . pep MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 
25 I I I t I : I I I M 1 I I M I I I t It [ I I I I I : M I I I i M I I I I I I : I I i I I I I I ! I I i I I I i 

or f 9 1 - 1 MKKSSLI SALGIGILS I GMAFAAPADAVSQ IRQNATQVLS I LKNGDANTARQKAEAYAI P 

10 20 30 40 50 60 

70 80 90 100 110 120 

30 or f 91a . pep YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 

I I t I M i I t I I I I I I M I I I 1 I I I I I I I I I t f I I I I I I I I M I t I I I I I I I I I I M i I I I 
or f 9 1 - 1 YFDFQRMTALAVGNPWRTASDAQKQALAKE FQTLLIRT YSGTMLKLKNANVNVKDNPIVN 

70 80 90 100 110 120 

35 130 140 150 160 170 180 

or f 91a . pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
I I i I I I I I I I I I I M I t i I I I I I I I I i I I I i I I I I I I I I t M I I I I I I I I I I I t I I I 1 I t 
orf 91-1 KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

40 

190 

orf 91a . pep GVDGLIAELKAKNGSKX 
1 I I t I I I I I I I I I i: i I 
orf91-l GVDGLIAELKAKNGGKX 

45 190 

Homology with a predicted ORF from N.^onorrhoeae 

0RF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
gonorrhoeae: 

50 orf 91 .pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 60 

: I [ I I : I I I I It I I I I I I I I I I : t t I 11: I i t 1 I i I i t I : I I I : I I I : I t I I t I i I : I 
orf 91ng VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

orf 91 .pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 93 

55 I I I t I I H I I I i I I I I I I ) t I I I I I I t I 

orf91ng YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length ORF91ng nucleotide sequence <SEQ ID 35 1> is predicted to encode a protein 
having amino acid sequence <SEQ ID 352>: 
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1 VKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 



1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGA7VATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

401 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 

451 GGCAAATACC GTACCTACAA CGTCGCCATC G/VAGGCACGA GCCTGGTTAC 

501 CGTGTACCGC AACCT^TTCG GCGAAATCAT CAAAGCCT^ GGCATCGACG 

551 GGCTGATTGC CGAGTTGAM GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 354; 0RF91ng-l>: 



1 MKKSSFISAL GIGILSICatA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

0RF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 



10 20 30 40 50 60 

orf 91-1 . pep MKKSSLIS/VLGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I I I I: t I 1 I I I I t I I I I t I i I: t t I I I: I I I I I I I [ M : I i I : I I t : I I I I I It I : I 
orf91ng-l MKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 91-1 . pep YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 
I t i I I I I t t I I I i M I I i I It I I t I I I M It I I I It I i I I I I I I I : I I i : I I I I I I I I I I 
orf91ng-l YFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLLIRTYS GTMLKFKNAT VNVKDNPIVN 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 91-1 . pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
|||ttl:ltlll|:|||tlllltlltl lttlllllttlMtt:|IMttlllillllltl 
orf91ng-l KGGKEIWRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 



190 

orf 91-1 . pep GVDGLIAELKAKNGGKX 
I : I I I It t 1 I I 1 t 1 I I I 
orf91ng-l GIDGLIAELKAKNGGKX 
190 

In addition, 0RF91ng-l shows homology to a hypothetical E.coli protein: 



3p|P45390|YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
REGION PRECURSOR (F211) >gi 1606130 (018997) ORF_f211 [Escherichia coli] 
>gi 11789583 {AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 
region [Escherichia coli] Length = 211 

Score = 70.6 bits (170), Expect = 6e-12 

Identities = 42/137 (30%), Positives - 76/137 (54%), Gaps = 6/137 (4%) 

Query: 59 VPYFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPI 118 

+PY + AL +G +++A+ AQ++A F+L + Y + + T + P 
Sbjct: 65 LPYVQVKYAGALVLGQYYKSATPAQREAYFAAFREYLKQAYGQALAMYHGQTYQIA~PE 122 

Query: 119 VNKGGKE I V-VRAEVG I P-GQKPVNMDFTTYQSG— GKYRTYNVAI EGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 
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Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from Kmeningitidis and Kgonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 42 

The following DNA sequence was identiiSed in Kmeningitidis <SEQ ID 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

1 MBCHILPLTAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 
51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 357>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 358; ORF97-l>: 

1 MKHILPLI7VA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETEX5K VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A oiK 
meningitidis: 

10 20 30 40 50 60 

orf 97 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 
i I I I I I I i I I I I I I 1 I I I i t I t : I I i I I I I lilt Hill : : t I t i I I 

orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 i N i 1 1 1 1 i M i 1 1 1 I i 1 1 1 1 1 

orf 97a MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 
5 70 80 90 100 110 120 

130 140 150 160 

orf 97 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 

illllllllilililillllilltlllllllllilhiil 

10 orf 97a VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 

130 140 150 160 

The complete length ORF97a nucleotide sequence <SEQ ID 359> is: 

1 ATGANACACA TACTCCCCCT GANTGNCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGNN CATCCTGCCA GCGAACCGCA AACCCAAAAC GAAACCGCTA 

15 101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GTACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCNTCG TTACCGAAAC 

20 351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCAT AGGCGAATAA 

This encodes a protein having amino acid sequence <SEQ ID 360>: 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

25 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTIGE* 



ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap; 



10 20 30 40 50 60 

30 orf 97a . pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

I I I I I I I I I I I I i I t I I t I I I I : I I i I I I I I I I I I I I I I M I I i I I I M I I I t t I I 
orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 

35 70 80 90 100 110 120 

orf 97a . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

1 1 1 1 1 1 1 1 1 1 i I M 1 1 M 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) 1 1 1 1 1 11 1 1 1 1 1 1 M until 

orf 97-1 MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
70 80 90 100 110 120 

40 

130 140 150 160 

orf 97a . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 
lllllllttllllllltlltiMllilllttliill:|ll 
orf 97-1 VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
45 130 140 150 160 

Homology with a predicted ORF from N, gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from N. 
gonorrhoeae: 

50 orf 97 .pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60 

llltll lllll:tllMllli)::l IDIlit ill! nil) : :IIIMI 
orf97ng MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSBCYSFDETVSRLETAIKSKG 60 

orf 97 .pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 
55 II I II I I II M M I II II I II II I I II I II I I 1 I II I I I I I 1 t I I I I M I I I I II I II I I 

orf97ng MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

orf 97 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGE 159 
ll:|||||||ll:llll:|MIIIIIIIIIIIMIIIII 
60 orf97ng VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGE 159 
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The complete length ORF97ng nucleotide sequence <SEQ ID 361> is predicted to encode a protein 
having amino acid sequence <SEQ ID 362>: 

1 MKHILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
5 101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 

1 ATGAAACACA TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

10 101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

15 351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

401 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: . 

1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
20 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 
151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

25 orf 97-1 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

IIMIIIIIIIIMIIIIMIt)::! I i I I I I I I t 1 I I t I I I I M I I I I I i I I t I t I I I 
orf97ng-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 

30 70 80 90 100 110 120 

orf 97-1 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
I I I I i I I I M I t I I I t I I I I I I I M I I I I t i I I I i M I I I I I I t I t I I I I I I I M I I I I t 
orf97ng-l mDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
70 80 90 100 110 120 

35 

130 140 150 160 

orf 97-1 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
I I : I I t I t I I I I : I I I I : I I t I i I I I I I I I I I I I i I I i I I 
orf97ng-l VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGEX 
40 130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and Kgonorrhoeae^ and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
45 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of afifinity purification of the GST-fiision and His-fiision 
proteins. Purified GST-fiision protein was used to immunise mice, whose sera were used for 
Western Blot (Figure 12C), ELISA (positive result), and FAGS analysis (Figure 12D). These 
experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a usefiil immunogen. 
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Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 



Example 43 

The following DNA, believed to be complete, sequence was identified in Kmeningitidis <SEQ ID 
365>: 



1 


ATGGCTTTTA 


51 


GCTGATGCTC 


101 


GCCGTGCCGA 


151 


CGCTTCCAAA 


201 


CGTGCCGCTC 


251 


CTTCTTATCG 


301 


GACTACAAAC 


351 


CGgCGCGTTT 


401 


CCGGCGCGGT 


451 


GCGGAAGCAG 


501 


AAAACTGCCC 


551 


ATTTGGATTC 



This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 



1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT WRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF fi-om N.meninintidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A of A^. 



meningitidis: 

10 20 30 40 50 59 

orf 106 . pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 
I I I I I i I I I I i II:: II : : :: I I I II I M I I I I II : II I I I I t I I I I I I I I I 
orf 106a MAFITRLFKS I KQWLVLLPMLSVLPDAAAEG I DVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 



orf 106. pep 



60 70 80 90 100 110 119 

LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
II I III II II tItllllllMII lllllllll lllllllllll:|ltlllll 
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orfl06a LQXAXXRGVXLNXTLXWQLSAPIIASYRFXLGQLIGDDDXIDYKLSFHPLTNRYRVTVGA 
70 BO 90 100 110 120 

120 130 140 150 160 170 179 

5 orflOe.pep FSTDYDTLDAALRATGAVANWKVLNKGTILSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

It) I t I I I I I I t I I I I I I I I I I I I I I I I I t t t I I I I t t I I t I t I I I I I I I I I I I I I I It 
orfl06a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
130 140 150 160 170 180 

10 180 190 199 

orf 106 . pep SQNWHLDSGWKPLNIIGNKX 
I I I I I I i I I I I I II I II I I I 
orf 106a SQNWHLDSGWKPLNIIGNKX 
190 200 

15 Due to the K">N substitution at residue 1 11, the homology between ORF106a and ORF106-1 is 
87.9% over the same 199 aa overlap. 



The complete length ORF106a nucleotide sequence <SEQ ID 369> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

20 101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTCCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

201 GGGCGTGNCG CTCAACTNTA CCTT/^GNTG GCAGCTTTCC GCCCCGATAA 

251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

25 351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

30 This encodes a protein having amino acid sequence <SEQ ID 370>: 

1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEAR IXDGGQLSXX 
51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 
101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 
151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

35 

Homology with a predicted ORF from Kzonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from A^. 
gonorrhoeae: 

orf 106. pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 59 
40 I II I II I I II t I I : : : I : : : : II I I t : : II I II I I It i : I I I It II II t I I II 

orfl06ng MAFITRLFKSIKQWLVLLPILSVLPDAAAEGIAATRAEARITDGGRLSISSRFQTELPDQ 60 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 119 
I I I I I I I I I I t II t 1 I M I I I I II 1 I I I 1 t 1 II t I 1 I I i I I 1 It t It I I 1 : 1 I I I I I It 
45 orfl06ng LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

orf 106 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAE IRLTLSTSKL PKPFQINALT 179 

I I 1 I it I t I II II 11 1 t 1 1 1 1 1 It t 1 t t t 1 1 1 t 1 1 1 t t i I I It I i 1 t t 1 11 t t 1 t I I I I t 
orfl06ng FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAE IRLTLSTSKL PKPFQINALT 180 



50 



orf 10 6. pep SQNWHLDSGWKPLNITGNK 198 

I M 1 I II 1 t ! 11 II M 1 t t 
orfl06ng SQNWHLDSGWKPLNIIGNK 199 

Due to the K->N substitution at residue 11 1, the homology between ORF106ng and ORF106-1 is 



55 



91.0% over the same 199 aa overlap. 
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The complete length ORF106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 

1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETECAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from Kmeningitidis and gonorrhoeae^ and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF 106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13 A shows the resuhs of aflSnity purification of the His-fusion protein, and Figure 13B shows the 
results of expression of the GST-fusion in E^coli. Purified His-fiision protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF 106-1 is a surface-exposed protein, and that it is a useful immimogen. 

Example 44 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
373>: 

1 ATGGACACAA AAGAAATCCT CGG.TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCc TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

451 CTCGCCATCC TGCTGCTG.T GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG.TGCCGC 
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1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCATU^CGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC.GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

5 1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

1401 GAAAAAACAA GGTTTCCCAT TATGA 

10 This corresponds to the amino acid sequence <SEQ ID 374; ORF10>: 

1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRTU. AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NIAAAAFLLF QNRCRLKAVR 

15 201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAPPARLSAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

401 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

20 451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

25 151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

30 401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

451 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

35 651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

40 901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGGG CGCGGCGCGG 

45 1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

50 1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ID 376; ORF10-1>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

55 151 LAILLLLPLT VGLL HFPANT AVLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AES AAALLAS 

301 ALCLTGIFSP L ASLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRP IALAT LGALAANLLL LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

60 401 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 



Computer analysis of this amino acid sequence gave the following results: 
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Prediction 

ORFlO-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 

Homology with EpsM fixtm Streptococcus thermophilus (accession number U40830). 
ORFIO shows homology with the epsM gene of 5. thermophilus^ which encodes a protein of a size 
similar to ORFIO and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 

Identities = (25%) 

Query: 213 LRYGIPLALSSLAYWGLASADRLFLKKYAGLEQLGVYSMGISFGGAALLLQSIFSTVW 270 

L Y +PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 LYYALPLIPSSILWWLLNASSRYFVLFFLGAGANGLLAVATKIPSIISIFNTIFTQAW 267 



15 



20 



25 



15/57 (26%), Positives = 31/57 (54%) 



Identities 
Query: 
Sbjct: 

Identities = 16/96 (16%), Positives = 36/96 (37%) 



7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 
L + G-f+GS +L +++PL ++ + G L QT A L + ++ + + A +R 

12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 



Query: 307 IFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIXXXXXXXXXX 366 

+ P+ ++ +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPIVEKWSSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSIYGTIV 364 

Homology with a predicted ORF from Kmenin^tidis (strain Al 

ORFIO shows 95.4% identity over a 475aa overlap with an ORF (ORF 10a) from strain A of A^. 



meningitidis: 



30 



35 



40 



45 



50 



55 



orf lO.pep 
orf 10a 

orf 10. pep 
orflOa 

orf 10 .pep 
orflOa 

orf 10. pep 
orflOa 

orf 10. pep 
orflOa 



10 20 30 40 50 60 

MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I It I I I I I 1 I t I I I I t 1 I I I t t I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I 
MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
||||]||:)||||||IIMIIIMIIIIttillilillltllllMll)lllllllllli 
YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGXGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

LSFLPXRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 

lllllllllllltlllilllllllllll MIMII 1 I I ) I I I I I i I I I I I 11 I I t I 1 I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
130 140 150 160 170 180 

190 200 210 220 230 240 

NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 

t I I I I I I I I t M I I I I I I I I : I I I I I I I 1 I I 1 i M I I I I I I I I t I I [ I I I t I I I I I I I 
NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 
190 200 210 220 230 240 

250 260 270 280 290 300 

AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
I I M I I I I [ I I I I I I I I I I I I t I I i I I I I I t I I I I i I I I I I t I I I t I I I I I I I I I I I I I 
AGLEQLGVYSMGIS FGGAALLFQS I FSTVWTPYI FRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 10 . pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
III It I I I II I II I M t I) i I I I I I I II M I I I II II : I I I I I I I t II t I I I II I I I I 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIAIAT 

310 320 330 340 350 360 



370 380 390 400 410 419 

orf 10 . pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
lllllllllllll III: llltllllllllll:lllltlltillllllllll:ll 
orf 10a LGALAANLLLLGL— AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 



420 430 440 450 460 470 

orf 10 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
II II : I II II II t II I I t I I I II II I I : II I t M t It M I II I i I I I t I t i II III 
orf 1 Oa LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORF 10a nucleotide sequence <SEQ E) 377> is: 



1 ATGGACACAA AAGAAATCCT 

51 GGTTTTAGCC GTCATCATCC 

101 ACGACATCGG ACGCATCGTG 

151 TCGGTGTTGT GCCTCGGGCT 

201 CGCCGCCGAC AAAGACACTT 

251 TGTCTGCCGC CGCGATAGCC 

301 TCTGAAATCC TGTTTTCGCT 

351 GCTGTTTGAA CTGAGCTTCC 

401 GTATGGAAGG ACGCGCCCTT 

451 CTCGCCATCC TGCTGCTGCT 

501 GGCGAACACC GCCGTCCTGA 

551 CCGCCGCCTT TTTGCTGTTT 

601 CGCGCACCGT TTTCATCCGC 

651 ACCGATCGCA CTAAGCAGCA 

701 GTTTGTTCCT GAAAAAATAT 

751 ATGGGTATTT CGTTCGGCGG 

801 AACGGTCTGG ACACCGTATA 

851 CCGCCCGCCT CTCGGCAACG 

901 GCCCTCTGCC TGACCGGCAT 

951 GGAAAACTAC GCCGCCGTCC 

1001 CGCTGTTTTG CACGCTGGTA 

1051 CGAAAAACAC GCCCGATCGC 

1101 CCTGCTGCTG CTGGGGCTTG 

1151 CGGTTGCCTG TGCCGCCTCA 

1201 AGCTCCTGCC GCCTGTGGCA 

1251 CACATTGTTC TGCCTGGCCT 

1301 CGGCAAACTA CCCCCTGTTT 

1351 TGCATCCTGC GCCACCGGAA 

1401 AAAACAAGGT TTCCCATTAT 

This encodes a protein having amino acic 



CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 
TGCCGCTGCT GTCGTGGTAT TTCCCTGCCG 
CTGATGCAGA CGGCGGCGGG GCTGACGGTG 
GGATCAGGCA TACGTCCGCG AATACTATGC 
TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 
GCCCTGCTGC TTTCCCGCCC ATCCCTGCCG 
CGACGATGCC GCCGCCGGCA TCGGGCTGGT 
TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 
GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 
GCCGCTGACG GTCGGGCTGC TGCACTTTCC 
CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 
CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 
CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 
TCGCCTATTG GGGGCTGGCA TCCGCCGACC 
GCCGGCCTAG AACAGCTCGG CGTTTATTCG 
AGCGGCATTA TTGTTCCAAA GCATCTTTTC 
TTTTCCGCGC AATCGAAGCA AACGCCCCGC 
GCAGAATCCG CC6CCGCCCT GCTTGCCTCC 
TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 
GGTTTATCGT CGTATCGTGT ATGCTGCCTC 
GAAATCAGCG GCATCGGTTT GAACGTCGTC 
GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 
CCGTACCGTC CGGCGGCGGG CGCGGCGCGG 
TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 
GCCGCTCAAA CGCCTGCCGC TTTATATGCA 
CCTCGGCGGC CTACACCTGC TTCGGCACTC 
GCCGGCGTAT GGGCGGTATA TCTGGCAGGC 
AGATTTGCAC AAACTGTTTC ATTATTTGAA 
GA 

sequence <SEQ ED 378>: 



1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFIWSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

ORFlOa and ORFlO-1 show 95.4% identity in 475 aa overlap: 



10 20 30 40 50 60 

orf 10-1 . pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

iiinii iiiitiiitiiiiiiiiiiiitiiiiiiiitiiiiiiiiiiMiiiiiiiii 

orf 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I 11 I I I I : I I I I I t I I i I I t t I I I I I I t i I I i I I I I I I I I I I I I I I 1 I i I I I t I I I I I I i 
orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I I It I I I I M I I t I I I I t I I I I t I I I I I I I t I I I I t I I I I I I i t I M i I I I I I I i 
orf 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10-1 . pep NIAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
|||||IMIIIItilMIII:llll IIMil llllllttllllltliltllltlllll 
orf 10a NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 . pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
I I I I i I i I I M I I I I I I I i t I I t I M I I i I I I I I ! I I I I I I I t I I I I I I I I t I I I I I I I 
orf 10a AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10-1. pep ALCXTGIFSPIASLLLPENYAAVRFIWSC>IXPPLFCTLAEISGIGLNVVRKTRPIALAT 
111 I I 1 II I I I I II I M I II I I M I I I I I I I II t I I I : I I II I M It I I I II II II I I 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 419 

or f 1 0-1 . pep LGAIAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
I I I I 1 i t 1 I I II I 111: I I t I t I I M II I t I : I I I I I I I I I II M II II II : II 
orf 10a LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf 10-1. pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I I I : I I I II I I I I I I I I I I I I II II I : I I I I I I I I I I t II I I I t I II t I I 1 I I II 
orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

Homology with a predicted ORF from N.sonorrhoeae 

ORFIO shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from A^. 



gonorrhoeae: 



orflOng.pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I I I I I I I I I I I I I II It I I I I I I I II I I I I I t t I I II I I i i I i I I I II M I I I 1 t I II I 
orflOnm MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 



60 



60 



orflOng.pep 



orf lOnm 



orflOng.pep 



orf lOnra 



YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

I I I I I II: I I t I I I I t I I 1 I I I I I UIIIIMIIItlllllllttlllllllltlllll 
YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

I 1 I I I I I I II t 1 I i t I I I II I I I I I I I I I I 1 I I 1 I I I I I I t 1 i I I I I II : II t 1 I t I I I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 180 



orflOng.pep NLT^AAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 240 

t 1 I I I I I I 1 I I I t I I 1 I I I I : I I I t t I I I I 1 I I I I I I : I I I t : I M I i 1 I I I I I I II I I 
orflOnm NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 240 



orf lOng .pep AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 300 

lllllllllltlllllllllhlllllllllllllllllMI I I I I I i I I 1 I I I I I I II 
orflOnm AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 300 



orflOng.pep 
orflOnm 



ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 
III I I I I I I I I I I t I I I I I I I I t I I 1 I I I I I I I I I : I I I t I I I I I I I I I I t I I I I I 
ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 



360 
360 
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370 380 390 400 410 

orf lOng . pep LGALAANLLLLGL— AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
I t I I I I I I I I i t I lit: I I t I I I i I I I I I II : I II H I I I I I I I I I I I II I: I t 
orflOnra LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf lOng . pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 
I i II : I I I I II t I I I M I I I I I I I I I II I I II t I I I I I I : I I I II I I I I II 11 t I t 
orflOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCcccgCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG ACTGACGGTG 

151 TCGGTATTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTTTTCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG GCGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAA 

451 CTCGCCATTC TGCTGCTGTT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC TCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCGCCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGCTCGCA CTGAGCAGCC TTGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCGGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGCTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGTGC AATCGAAGAA AACGCCACGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGAAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTACCGT CGTATCGTGT ATGCTGccgc 

1001 cgctGTTTTA CACGCTGACC GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GTCCGATCGC GCTTGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCACG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGTTGT TTTTTGTTTT CAAGACAGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTgGCCT CCTCGGCGGC CTACACCTGC TTCGGCACAC 

1301 CGGCAAACTA CCCcctgttt gccggcgtAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AAATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 380>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL PPLLFSAAIA ALLL SRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVUttiEGRAL AFSSAQLVPK 

151 LAILL LLPLT VGLLHFPANT SVLTAVY7VLA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSPAVLH RGLRYGIPLA LSSLAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LLQSIFSTVW TPYIFRAIEE NATPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFTWSC MLPPLFYTLT EISGIGLNW 

351 RKTRPI ALAT LGALAANLLL LGLAV PSGGT RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKNLH KLFHYLKKQG FPL* 

ORFlOng and ORFlO-1 show 96.4% identity in 473 aa overlap: 

10 20 30 40 50 60 

orf 10-1 . pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I I I I I I I t I I I I I I I I I I t I I I I I I I M t I i i I I I I M I II It I t I I I I I I I I I I II I II 
orflOng-1 MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I I I I : I I I I 1 I I I I I I I I 1 I I : I M I I I I I t I I I I I I I I II t M I M t I I I I I I 1 I 
orflOng-1 YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I I I M I I I I I I t I t I t I I I I I I t I t t I I I 11 I M I M I I t M I I : t I I I t I I M 
orflOng-1 LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10-1. pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 
t M I M I ) I I I I I ) I I i I I I : I I I M I I M I I I I t I I I : I I I I : M I i I I I I I I I I I I I I 
orflOng-1 NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 . pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
M t I [ I I I I I I I t I I I I I I I I : I I I I I I i I I I I I I M I i I i 1 I I I I I I t I I i i I I i I I I 
orflOng-1 AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10-1 . pep ALCLTGIFSPIASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNVVRKTRPIALAT 
I I I t I I I I t I I I I I I I I t i I I I I I I I I I I I i I I I t I I: I M I M I t I M I I I I I I I I I 
orflOng-1 ALCLTGIFSPLASLLLPENYAAVRFTWSCMLFPLFYTLTEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 10-1 . pep LGALAANLLLLGLAVPSGGTVRGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 
I I M I M I I I I I I I I I I I I: I I I I I i I I i I I I I I I : I I I I I I i i I t i i I t M I I I : M t I 
orflOng-1 LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 

370 380 390 400 410 420 

430 440 450 460 470 

orf 10-1 . pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I : 11 i I M I t I I I I t I I I I I I I I t M I I t I I I I t I i : I I I I I t t I t M I t I I I 
orflOng-1 CLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N.meningitidis and N.gonorrhoeae^ and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 45 



The following partial DNA sequence was identified in K meningitidis <SEQ ID 381>: 



1. .ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA T7WVGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

201 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGG 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

401 GCAgCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

451 AA.AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 



1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
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101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCBCAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

401 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGCGAAGG AAA/VAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCX3GAAGGG CAGCGTGCCA 

701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS lEKARSAAAK 

201 EVQKMKTSDK ZVEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeningjtidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of K 
meningitidis: 



10 20 30 

or f 65 . pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

illt:ll IDIIhtMIIIIIIIIII I 
orf65a IIAGILF YLNQSGQNAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADQNALSEPDAAKE 
30 40 50 60 70 80 



40 50 60 70 80 90 

orf 65 . pep T^QSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
lllltlt:|llllitlllltilttll mil: ItllllllllilllMII iiiini 
orf 65a AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 
90 100 110 120 130 140 



100 110 120 130 140 150 

orf 65 .pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
Mill I II t I I I II II I II I I I I I I I II I I II I I t I I t I I I I I I I I I I I I II I I I II 
orf 65a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQILNSGSIEKARSAAAKEVQKM 
150 160 170 180 190 200 

160 170 180 190 200 210 

orf 65. pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

orf 65a KTPDKAEATHYLC»1GAYADRRSAEGQRAKLAILGISSKWGYQAGHKTLYRVQSGNMSAD 
210 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 
51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 
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101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGTTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAAG CCGACGAGGT TGAGGAAAAG GCGGACGAGC CGGAGCGGGA 

351 AAAGTCGGAC GGACAGGCAG TGCGCAAGAA AGCACTGACG GAAGAGCGTG 

401 AACAAACCGT CGGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCATC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGAGAAGG AAAAAGTTGC ACCCAAACCG ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCTGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GCCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGCCGGAG CGCGGAAGGG CAGCGTGCCA 

701 AACTGGCMT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 386>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEXLKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS lEKARSAAAK 

201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overly: 



10 20 30 40 50 60 

orf 65a . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQPAETEILKPK 
I I i I I t M I I I I I t I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 1 : I I I I I I I I I I I I I 
orf 65-1 MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 65a . pep NQPBCEDIQPEPADQNALSEPDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 
I t I I I I ) t I i I M I 1 I I I I I i I I I t I I I I I I i I 1) I I t I I I I I [ t I M I t Hill: I 
orf 65-1 NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 65a . pep GQAVRKKALTEEREQTVGEKAQKKDAETraCQAVKPSKETEKKASKEEKKAEKEKVAPKP 
I I i I I I I I t i I M [ I I I I I i I I I I I I 1 I It i I I I I I It I M I I t M I I I I t I II I I I 1 
orf 65-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 65a . pep TPEQILNSGSIEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 
II I I I I i I M I I I I I I I I I I t i I I I II I M I I I I I II I I I t I I ) : I I I t i I I I t It I I t 
orf 65-1 TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYLQMGAYADRQSAEGQRAKLAILGI 

190 200 210 220 230 240 



250 260 270 280 290 

orf 65a . pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
I I I I II II I I I I II I I I I I I I I I I I I i I i I i I I I II I [ II t I It I t I I II 
orf 65-1 SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

250 260 270 280 290 



Homology with a predicted ORF from N.sonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from A^. 
gonorrhoeae: 



30 40 50 60 70 80 

ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 

III : I I 1 It II I: I I I I I t I I 1 I 1 : I I 
0RF65 ILKPHNQLKEDIQPDPADQNALSEPDAATE 

10 20 30 
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90 100 110 120 130 140 

ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

n I I I I I : t I I I I t I I I i I I M I I I I I I i I I I I I [ i I I t I I i I I t I I I I t I t I I I t I I I I 
ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
5 40 50 60 70 80 90 

150 160 170 180 190 200 

ORF65ng AETVKKKAVKPSKETEPCKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 

Mill : I I I I I I I I I t I I I I I I I I I I I M I I I I N I I I I I I i IN I I It 1 I I I I M I 
10 0RF65 AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 

100 110 120 130 140 150 



210 220 230 240 250 260 

ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 
15 t I M I I I I It I t I I I : lltiMMMilllll:||:tlllllll|:iMilt t 

ORF65 XNVRQGGSXRI ICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCECAAICLP 

160 170 180 190 200 210 



ORF65ng MR 

20 II 

ORF65 MR 

An ORF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 
acid sequence <SEQ ED 388>: 

1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

25 51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEBCAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS lEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 



30 After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 



35 



40 



45 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTATGA 
CTTCGGTTTG 
TGAACCAGGG 
CCTGCAGAAA 
CCAACCTGAA 
AAGAGGCAGA 
GCCGACAAag 
aGAGCCGGAC 
AACAAACcgt 
AAacaaGCgg 
agagaaaaag 
aaatcctcaa 
gaAgtgcaGA 
CTGcaaatgg 
ccaaACtggc 
GGACATAAAA 
gGTGAAAAAA 
TCCGTGcgAT 



ACAAATTTTC 
ATACTGGCAA 
CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
ccgacgAGGT 
ggACAGGCAG 
cagggAAAAA 
tAaaaccgtc 
gcggcgaaag 
cagccgCagc 
AAatgaaaaa 
gcgcgtatgc 
aAtcttgGgc 
CGCTTTACCG 
ATGCAGGACG 
TGAAGGCAAA 



CCAATCCGGA 
CGGTCATTAT 
GCGTTCAAAA 
GAAACTGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAAGAAAag 
TGCGCAAGAA 
GCGCagaaga 
tAAAGAAACa 
aaaAAGttgc 
atcgaaaaag 
ctTtgggcaa 
cgaccgtccg 
atatctTccg 
CGTGCAAagc 
AGTTGAAAAA 
TAA 



AAAGGTCTGT 
TGCCGGTATT 
TCCCGGCTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GcGGgcgAgc 
AGCACTGAcg 
AAGATGCCGA 
gagaaaaaag 
acccaaaccg 
cgcgtagtgc 
ggcgGaagcc 
gagcgcggaA 
aagtggtcgG 
GGCAatatgt 
GCATGGGGtt 



This encodes the following amino acid sequence <SEQ ID 390>: 



50 



55 



1 MFMNKFSQSG 

51 PAETEILKLK 

101 ADKADEVEEK 

151 KQAVKPSKET 

201 EVQKMKNFGQ 

251 GHKTLYRVQS 



KGLSGFFFGL ILATVIIAGI 



NQPKEDIQPE 
AGEPEREEPD 
EKKASKEEKK 
GGSQRIICKW 
GNMSADAVKK 



PADQNALSEP 
GQAVRKKALT 
AAKEKVAPKP 
ARMPTVRSAE 
MQDELKKHGV 



LLYLNQGGQN 
DVAKEAEQSD 
EEREQTVREK 
TPEQILNSRS 
GQRAKLAILG 
ASLIRAIEGK 



CCGGTTTCTT 
TTGCTTTATC 
GTCGAAGCAG 
AGGAAGACAT 
GATGTTGCGA 
GCAGCCCGTT 
cggaACGGga 
gAAGAgcGTG 
AACGgTTAAA 
cTtcaaaaga 
accccggaaC 
cgctgccaaa 
aacgcattaT 
gggcagcgtg 
CTATCAGGCG 
ccgccgatgc 
gcCAGCCTGA 



AFKIPAPSKQ 
AEKAADKQPV 
AQKKDAETVK 
lEKARSAAAK 
ISSEWGYQA 



ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 



10 20 30 40 50 60 

orf 65-1 . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 
I I I I I I It I I i I I M I I I I [ I I i I I I I I I I I : I i I I : M I t t I I I I I M t I 1 I I I I I t 
60 orf65ng-l MFMNKFSQSGKGLSGFFFGL I LATV 1 1 AG I LLYLNQGGQN AFKIPAPSKQ PAETEILKLK 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 65-1 . pep NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
MMIMllllllMlllllt:! I M I ) t I t I I i I I i I I I I I i I I I I I I I I I I I t I I I I 
orf65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 65- 1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
I I I I I I I I t I M M I i I I I I I I I I I I I I I I I M I I I 1 I I I I I M I t I I t t i I I I I I I t M 
orf65ng-l GQATOKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEBCKAAKEKVAPKP 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 65-1 . pep TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 
lllilllt llltinillillllll: :::::::: : t I I I I M I I I I I t 
orf65ng-l TPEQILNSRSIEKARSAAAKEVQKMKKFGQGGSQRIICKWARMPTVRSAEGQRAKLAILG 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 65-1 . pep ISSKWGYQAGHBCTLYRVQSGNMSADAVKKMQDELECKHEVASLIRSIESKX 
I i t : I I I I I M 1 M I I i I I M I i I t } I I I i I I I I I I I I I I I I I I : I I : i I 
orf65ng-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 
250 260 270 280 290 



On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N.meningitidis and Kgonorrhoeae^ and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 46 

The following DNA sequence, beheved to be complete, was identified in N.meningitidis <SEQ ID 
391>: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs.s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 



1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 
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351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 

1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QNILYTAAN L LLLFLGLYLS 

GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSI PACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Computer analysis of this amino acid sequence gave the following results: 
Homolopv with a predicted ORF from N.meninsitidis (strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 
II I I I I I I I I M I I M I I I 1 I i I II I II II II I I II II I II II I II I II I I II I I 
orf 103a MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
I I t I I I I t I I I I I I I I II I I I I t I I I I I II I I I I I I I I I II I I I I I I t I I II t i I t I I I 
orf 103a GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103 . pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I I I I I I I I I I I I I I I I I I I I t I I I I I I II II t II I I I I I I I I II I I I I II i I I I I I I i I I 
orf 103a NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II I I II M I I I I I I I M I I I I I I I I I I I I I I 1 I I I I I t I I i I 
orf 103a NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

The complete length ORF 103a nucleotide sequence <SEQ ID 395> is: 

1 ATGAACCANG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTNT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCNTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTA 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTNGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGNAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This encodes a protein having amino acid sequence <SEQ ID 396>: 

1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 
51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 
101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWL PCGL 
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151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLXAIGIF SL QLXKIMQNRY 
201 IRLCTGLSVS LWALWKLAVL WL* 

ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
tl IIIIIIIIMIIIIiilllltllllllllllllMIIII t I I I I i I I I I i I I I i I I 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
10 20 30 40 50 60 

70 80 90 100 110 120 

GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
t I t I I I I I I I I I t I I I I I I t I I t i I I I I I i I f t i I I I I I I I I I I I I I I I I I t i I I I t M 
GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
t M I I I I i I t t I I I I I I I i t I I 1 1 I I I I I I I I I M I I t I M I M I I I I I I I I I i I M I I t 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
130 140 150 160 170 180 

190 200 210 220 

NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
II I I I [ I I I I I I M I I t I t i I I I I I I I I I I I I I I I M I M I 
NLLAIGIFSLQLBCKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
190 200 210 220 

Homology with a predicted ORF from Ksonorrhoeae 

ORF103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from K 
gonorrhoeae: 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 

I I I I I I I i I i I i I i I I I I I I t I I M I I i i t I I i I I M I I i I I I I I I I I I I : t I I i I i 
Orfl03ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

I I : I I I I I I : I : I I M I I I I I I I I I I I : t I t I I I t I I I I I t t It I I I I I I i I I I I I I I I t 
or f 1 0 3ng GLMLGLIGQLGI SLDQTRVLQN I LYTASNLLLLFLGLYLSGI S SLAAKI EKI GKPI WRNL 120 

orf 103 . pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 

I II I M I) II I I I i I i II II I II I II I I I I I II II I II II II II : I II II II II M I I tl 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

I t t I I I t I I t t I I t I I I M I I I I I I I I I I I I I t I I I I I I I 1 
Orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTGCTCG GTTTCTTCGG 

51 CGGAACTCAC TGCATCGGTA TGTGCGGCGG ATTMGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATTCT GCTGCTTAAC 

151 ACAGGACGGA TAAGCAGCTA TACGGCAATC GGCCTGATGC TCGGATTAAT 

201 CGGACAACTC GGCATTTCAC TCGACCAAAc ccgcgTCCTG CAAMTATTT 

251 tatacacagc ctccaaCCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGCAACCTG AACCCGATAC TCAACCGGCT GCTGCCCATA AAATCCATAC 

401 CCGCCTGCCT TGCTGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CATCACTTTA CGCGCTGGGA AGCGGTAGTG CGACAACCGG 

501 CGGACTGTAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACAGGATT ATCCGTATCA TTATGGGCAT TATGGAAGCT 

651 TGCCGTCCTG TGGCTGTAA 



orf lC3a .pep 
orfl03-l 

orf 103a. pep 
orfl03-l 

orf 103a. pep 
orfl03-l 

orf 103a. pep 
orfl03-l 



This encodes a protein having amino acid sequence <SEQ ID 398>: 
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1 MNHDITFLTL FLLGFEXSGTH CIGMCGGLSS AFA LOLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL QNILYTASN L LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSI PACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY MLAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORF103ng and ORF103-1 show 97.3% identity in 222 aa overlap: 

10 20 30 40 50 60 

orf 103-1 . pep MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
MlilllMIMIIIIllltlllllllllltMtltitlllMllllllllll:|||||| 
orflOBng MKHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKI6KPIWRNL 
II:tlllll:t:liltlMltllllll:llll[[||||MlliiMIMIltlllillll 
orfl03ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103-1 . pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
IIIIIIIIIMIIIIIIIMf llllMIII[tllll]tllMtl:ltlllll)tllllii 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103-1 . pep NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I I I I I 1 I M I I I M I 1 I I I I I I t I I I t M I I t I M I I t I I I t I 
orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underhned) in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N.gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 47 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 399>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT.TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCGaGGATT 

201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

401 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

451 GTCGGGTTTG GGCGCGTATG C.AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CGGCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

701 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCAGCGC 

751 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

801 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA. . 

This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 
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251 VTTLLPVFTV INTLLGHYVM PETFAAP. . . 

Further work revealed further partial DNA sequence <SEQ ID 401>: 

1 ATGGAAi\ACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWC SF RLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFAE PAHI GSL D6TLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP... 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein ofH. influenzae (accession number U32769) 
ORF104 and ffl0878 show 40% aa identity in 277aa overlap: 



orfl04 


4 


QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 


62 






Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 




HI0878 


3 


QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTXVWYRFIIAAVSLLALLAYKKQLPE 


62 


orfl04 


63 


— KRRDFSWCS FRLLLLGVAGI SANFVLIAQGLHYI SPTTTQVLWQIS PFTMI WGVLVF 


120 






K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 




HI0878 


63 


LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 


118 


orfl04 


121 


KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 


180 






K+++ QKI ++FND+F +GL Y GV+L G++ WV +AQKL+ 




HI0878 


119 


KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYQIAQKLM 


178 


orfl04 


181 


SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 


240 






+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 




HI087B 


179 


LRKFNSQQILLMMYIX5CAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 


237 


orfl04 


241 


KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 








W+ SKVS V TL+P+FT++ + + HY P FAAP 




HI0878 


238 


NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 274 





Homology with a predicted ORF from N. meningitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

orf 104 . pep riENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
llllltllll I t I i I I I I I I I I I :|llllllllllllllltlll(ll!llllllllll 
orf 104a MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 
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orfl04.pep LPKRRDFSWCSFRLLLLGVAGISANEVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
III I t I I I I I II t I I II II I t I I t I II t I I I II t I I II II I I I II I II II II i I II M I 
or f 1 0 4 a LPKWRDFSWCS FRLLLLGVAGI SAN FVLI AQGLH YI S PTTTQVLWQI S PFTMI WGVLVF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 104 . pep KDRMTAAQKXGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 
I I II t I I I I I I I I I I II I I I I : t I I I II M I I I t I I I I I I I II I M I t II I I t I I I I I 
orf 104a KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 104 . pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 
MllllllltlillMllllllltll )llll:IIIIMil:ltltlllitillMIIII 
orf 104a SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 



250 260 270 

orf 104. pep KHWEASECVSAVTTLLPVFTVINTLLGHYVMPETFAAP 
t I I I I I I I II I I 1 I t I I I I I 1 : I II I I I I i: I I I t I 
orf 104a KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMMGLGYAGALVWGGAVTAAVG 

250 260 270 280 290 300 

The complete length ORF104a nucleotide sequence <SEQ ID 403> is: 



1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGT GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACT GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGCCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCTGT TCAAACGCCG CTAG 

This encodes a protein having amino acid sequence <SEQ ID 404>: 



1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLA LGGR LPKWRDFSWC SF RLLLLGVA GISANFVtIA QGLHYISPTT 

101 T QVLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAK GV LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

201 FLPFA ELAHI GSL DGTLAWV CFAYCCLNTL I GYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN G LGYAGALW VGGAVTAAV G 

301 DRLFKRR* 



ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 



10 20 30 40 50 60 

orf 104a . pep MENQRPLLGFAUULLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I t I I I I It II t I I ) I I I I I I I I I I I M I I t I I I t I I I I I M t M I I ) I I I I I t I I 
orf 104-1 MENQRPLLGFAIJU:.LAAm*WGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 104a. pep LPKWRDFSWCS FRLLLLGVAG I SAN FVLIAQGLHY IS PTTTQVLWQI S PFTMI WGVLVF 
III llltllllMlllllllllllltllllMIIIIIIIMIIIIIIIIIIItllllll 
orf 104-1 LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 



orf 104a. pep 



130 140 150 160 170 180 

KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
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10 



15 



1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or fl 0 4 - 1 KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 104a . pep SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
I II II I I It t II M II I I I I II II II I I II II I I I I II I I I I I 1 M I II I I I I I I I I II 
orf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 104a . pep KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALWVGGAVTAAVG 

I II I I I I I I I t I I I II I I I II 11111111:11111 
orfl04-l KHWEASKVS AVTTLLPVFTVIXXLLGHYVMPET FAAP 

250 260 270 



Homoloey with a predicted ORF from N.gonorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 



20 



25 



30 



35 



gonorrhoeae: 

orf 104 .pep 
orfl04ng 
orf 104 .pep 
orf 104ng 
orf 104 .pep 
orf 104ng 
orf 104 .pep 
orf 104ng 
orf 104 .pep 
orf 104ng 



MENQRPLlX3FRIJaLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
II II I II I II I I I I I I I II I II I : I t I I I I I I II I I I I I I I II M II I I t II I I I I t i 
MENQRPLLGFALALIjy\MTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 



60 



60 



120 



LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
I I I I II I I I II I II I M I : I I I t I I I I I I II I II i I I I II I t I I I I I I I I M ) I I I I I I 
LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 
llltllll[llllll|:|lll:tlllllllllllll I II I I II I I I i I I I I lllllll 
KDRMTAAQKXGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVC YAVAQKLL 180 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSPGEAL 240 
I I I II I II I I 1 II I II II I I I I I I I I I I II : II I I I II t :: I I I I t I II I I I I I I I I I 
SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 240 

KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 
I i I I I I I I I II I I I I I I I I I I : I I I I I I I I : I I I I I 

KHWEASKVSAVTTLLPVFTVI FS LLGH YVMPDTFAAPDMNGLG YVGALVWGGAVTAAVG 300 



The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
40 protein having amino acid sequence <SEQ ID 406>: 

1 MENORPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLA LGGR LPKRRDFSWH SF RLLLLGVT GISANFVLIA QGLHYISPTT 

101 T QVLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGP QQIL LLIYAASAAV 

45 201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN GLGYVGALW VGGAVTAAV G 

301 DRPFKRR* 

Further woric revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 



50 



55 



60 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGCAT 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCttgT 
TCGGGTTTGG 
TATGGCCTGG 
TCGGGCCGCA 
TTCCtgccgT 
GGCGTGGGTT 



AAAGGCCGCT 
GGGACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTttgCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGccgaaCC 
TGTTTTGTGT 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
tgaCTGCCGC 
ATGTTTTtta 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGC 
CGGCGTGACG 
ATTATATTTC 
ACGATGATTG 
GCAGAAAATC 
ACGACAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGcaag 
GGAAGTTTgg 
GAATACGTTA 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGCGT 
GGTTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
tgccgccGTG 
aCGGTACGtt 
ATCGGTTACG 
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701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWH SF RLLLLGVT GISANFVLIA QGLHYISPTT 

101 T QVLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAK GV LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

201 FLPFA EPAHI GSLD GTLAWV CFVYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFATIPDMN G LGYVGALW VGGAVTAAV G 

301 DRPFKRR* 

ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 

10 20 30 40 50 60 

orf 104-1. pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I t I I I I I i I I I I I I I I I I I I t I I I I I I I It I It I I I I I I I I I I I t I I I It It I I I t I I I I 
Orfl04ng-1 MENQRPLLGFALALLAAMTWGTLPI AVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 104-1 . pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
Itlltllll lllltlttt:illllllttllttlMintillltlllttttlllllltl 
orfl04ng-l LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 104-1 . pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
M t 1 1 t II I It I I 1 t I : I t tl t I II t t I I I I I t 11 I II t I I I I t I t I I I 1 I I I I I I It t I 
orfl04ng-l KDRMTAAQKI GLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 104-1 . pep SAQFGPQQILLLIYAAST^VFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLTGYGSFGEAL 
I I I II I I I I I I I I I I I II I II i I I n I I I I I II I I II I I I t t : I II I I I I I I I I I I I 1 t I 
orf 104ng-l SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

250 260 270 

orf 104-1 . pep KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
I M I I I II M I II I I II I I II t I I I I I I I : I t I I I 
orfl04ng-l KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 

250 260 270 280 290 300 

In addition, ORF104ng-l shows significant homology with a hypothetical K influenzae protein: 

gi 1 1573895 (U32769) hypothetical [Haemophilus influenzae] Length = 306 
Score = 237 bits (598), Expect = 8e-62 

Identities = 114/280 (40%), Positives = 168/280 (59%), Gaps = 8/280 (2%) 



Query: 


30 


Sbjct: 


3 


Query : 


89 


Sbjct: 


63 


Query: 


147 


Sbjct: 


119 


Query: 


207 


Sbjct: 


179 


Query: 


267 



Q+P M WG+LPIA++QVL ++A T+VW P 

QQPLLGFTFALITAMAWGSLPIAUCQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

—KRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 14 6 

K R ++W ++L+GV G+++NF+L + L+YX P+ Q+ +S F M++ GVL+F 
LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 



K+++ 



QKI 



+FFND+F +GL Y+ GV+L G++ WV Y +AQKL+ 



+F QQILL++Y 



F+P A+ + + L 



LA +CF+YCCLNTLIGYGS+ EAL 



Query: 267 KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMN 306 
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W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
Sbjct: 238 NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins Scorn 
N.meningitidis and N.gonorrhoeae^ and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCA7\A CCCCTCGGTT CAATGCCGAG GG.GTTTTGT 

101 T.TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

401 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

451 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG. . . 

This corresponds to the amino acid sequence <SEQ E) 410; ORF105>: 



1 MVARRAHNPK WGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPIV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

301 NEILYVFDAV LP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

401 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 
51 
101 
151 
201 
251 



MPrVRFTESV 
ERVKKDWEAG 
ECFDLTDGGG 
SPHKAVDPNK 
SQLHSLRSVS 
DAMLSGNMMH 



SKQDLDALFE 
CSESSDGIFL 
NPLFTLERAA 
LDNTAAGGVS 
RGVHNEILYV 
DAQLVTLDAF 



WAKASYGAES 
NADGWPDMGG 
FRPFGLLSRA 
GGEMPSEAVC 
FDAVLPETFL 
CRYGLIDAAH 



CWKTLYLNGL 
RLQHLALGWH 
VHLNGLTESD 
RESSEEAGLD 
PENQDGEVAG 
PLSEWLDGIR 



PLGNLSPEWV 
CAGLLDGWRN 
GRWHFWIGRR 
KTLLPLIRPV 
FEKMDIGGLL 
L* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF Scorn N.memnsitidis (strain A) 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A ofN. 
10 meningitidis: 

60 70 80 90 100 110 

orf 105 . pep ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAK/ISYGAES 

lilitliltIII:|lilMilltllllllt 
orf 105a MPTVRFTESVSKHDLDALFEWAKASYGAES 
15 10 20 30 

120 130 140 150 160 170 

orf 105 . pep CWKTLYLNGXPLGNLSPEWVERVXKDVfEAGCXESSDGIFLNADGWPDMGGRLQHLALGWH 
I I I I M I I I I I I I I I I M : I I i I I I I i I t I I I I t I I I I I I I I I I 1 I I I I M I I : 
20 orf 105a CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHIARIWK 

40 50 60 70 80 90 

180 190 200 210 220 230 

orf 105 . pep CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 
25 I I M I t I : I t I I M I I I : t I t I : I t I I I I I I I I I I I I I I I I : I I I I I I I I [ I I I I 

orf 105a EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 
100 110 120 130 140 150 

240 250 260 270 280 290 

30 or f 105 - pep SPHBCAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 

i I I I I I I I : I i f I I I I I I I : t I : I I I : I I I I t I I t I t i I I I I I I t I i t M M I I I I It 
orf 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 
160 170 ISO 190 200 210 

35 300 310 

or f 105 . pep RGVHNEILYVFDAVLP 
I I I I M I I I I I I M I i 

orf 105a RGVHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMHDAQLVTLDAF 
220 230 240 250 260 270 

40 The complete length ORF 105a nucleotide sequence <SEQ ID 413> is: 



45 



50 



55 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGCCGACCG 
CCTATTCGAG 
CGCTGTATCT 
GAGCGCGTCA 
CATTTTCCTG 
ACCTCGCCCG 
GAGTGTTTCG 
ACGCGCCGCT 
ACGGTTTGGT 
AGTCCGCACA 
CGGTGTTTCC 
GCGAAGAAGC 
TCGCAGCTGC 
CCTGTATGTA 
AGGATGGCGA 
GCTGCCATGT 
GGACGCGTTT 
AGTGGCTGGA 



TCCGTTTTAC 
TGGGCAAAGG 
GAACGGTCTG 
AAAAAGACTG 
AATGCGGACG 
AATATGGAAA 
ACCTGACCGA 
TTCCGTCCGT 
CGAATCGGAC 
AAGCAGTCGA 
AGCGGTGAAT 
CGGTTTGGAT 
ACAGCCTGCG 
TTCGATGCCG 
AGTGGCGGGT 
TGTCGGGAAA 
TGCCGTTACG 
CGGCATACGT 



CGAATCCGTC 
CAAGTTACGG 
CCTTTGGGCA 
GGAGGCAGGC 
GCTGGCCAGA 
GAAGCGGGAC 
CGGCGGCAGC 
TCGGACTGCT 
GGCCGATGGC 
TCCCGACAAA 
TGCCGTCTGA 
AAAACGCTGC 
CCCCGTCAGC 
TCCTGCCCGA 
TTTGAGAAAA 
CATGATGCAC 
GTCTGATTGA 
TTATAG 



AGCAAACACG 
TGCGGAAAGT 
ATCTGTCGCC 
TGCTCGGAGT 
TATGGGCAGA 
TGCTTCACGG 
AATCCCTTGT 
CAGCCGCGCC 
ATTTCTGGAT 
CTCGACAATA 
AACCGTGTGT 
TTCCGCTCAT 
CGGGGTGTGC 
AACCTTCCTG 
TGGACATCGG 
GACGCGCAAC 
TGCCGCCCAT 



ACCTTGATGC 
TGCTGGAAAA 
GGAATGGGCG 
CTTCAGACGG 
CGCTTGCAGC 
CTGGCGCGAC 
TCGCGCTCGA 
GTCCATCTCA 
AGGCAGGCGC 
CTGCCGCCGG 
CGCGAAAGCA 
CCGCCCGGTA 
ACAATGAAAT 
CCTG7VAAATC 
CGGTCTGTTG 
TGGTTACGCT 
CCGCTGTCCG 



This encodes a protein having amino acid sequence <SEQ ID 414>: 



60 



1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 
51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 
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101 ECFDLTDGGS NPLFALERAA FRPFGLLSEIA VHLNGLVESD GRWHFWIGRR 

151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

ORFlOSa and ORF105-1 show 93.8% identity in 291 aa overlap: 

10 20 30 40 50 60 

MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 
H I I I t I I I i I I : I I t t I i ! I t I I 1 i I I I I I I I I I M I I I I I I I I I I I I: I I I I i I I I t I 
MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
10 20 30 40 50 60 

70 80 90 100 110 120 

CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 
I I I t I I i i I i I I I I I I I I [ IIMII I: Mil I I I : I H I I I I i I : I I I I : I I t I I 
CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
70 80 90 100 110 120 

130 140 150 160 170 180 

FRPFGLLSRAVHLNGLVESDGRWHFWIGRRSPHKAVDPDKLDNTAAGGVSSGELPSETVC 
lllliltl)IIIMII:|llllill[|lllllltilll:|IMIIillti:ll:iM:ll 
FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
130 140 150 160 170 180 

190 200 210 220 230 240 

RESSEEAGLDKTLLPLIRPVSQLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
It I I I I I I M I I I I I i I t I I 1 I I [ 1 I I I I I i t I I I I I I t I I I I I I I i I I M I I I I I I M 
RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
190 200 210 220 230 240 

250 260 270 280 290 

FEKMDIGGLLAAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
M I I I I I I I I I i I I I I M I I i I I M I I I I I t I I i I I I i I I I I I I I I t t I t I 
FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
250 260 270 280 290 

Homology with a predicted ORF from N.^onorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORFlOS.ng) from N. 
gonorrhoeae: 

orf 105 . pep MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 

I I ! I I t t t i I t I I I i I III : I I I I I i I I II II I I M I I t It t I I I II II I I 

or f 1 0 5ng MVARRAHN PKWGSN PAPATKYQTPRFNAEGVLF FLFPAAS VFCRI FLPAAI SER 5 5 

orf 105 . pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 

I : I II I I I I I I I I I II I I I I I I I I : I t I I I M I I I I I I I I I I II I I I I II I I I I I I II 
orfl05ng QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 115 

orf 105. pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

lilt 111111111:11: I I I It II I 1 I : It I I t I I I 1 I II II t I I I I I I : III 
orflOSng LYLNRLPLGNLSPEWAERIKKDWEAGCSESSNGIFLNADGWPDMGGRLQHLARTWNKAGL 175 

orf 105. pep LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 240 

I I I I I I I I I t I I I I I I II I I I I I I II III I I I I I I t I : I I : I I I t I I I I t I t i t I 
orflOSng LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 

orf 105 . pep AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

1111:1111 : I I I t I t I I I I I I I I I II I I I I I I t I I 1 : I I ! I I I I : I M I I 1 I I I I I 
orflOSng AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 

orf 105. pep NEILYVFDAVLP 312 
t It t I I I I I I I I 

orfl05ng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYG 355 

A complete length ORF105ng nucleotide sequence <SEQ ID 415> was predicted to encode a 



orf 105a. pep 
orfl05-l 

orf 105a. pep 
orfl05-l 

orf 105a. pep 
orfl05-l 

orf 105a . pep 
orfl05-l 

orf 105a. pep 
orfl05-l 



protein having amino acid sequence <SEQ ID 416>: 
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1 MVARRMNPK WGSNPAPAT KYQTPRFNAE GVLFFLFPAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLIDAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence <SEQ ID 417>; 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

401 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 

601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

751 GATGCCATGT TGTCGAAAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 

1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 

10 20 30 40 50 60 

MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
I I I I I 1 I i I I I I I I I I t I I i I t I I I i I I I n I I I i I i I I t I I I f I I I : I I : I I I I i I I 
MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 
10 20 30 40 50 60 

70 80 90 100 110 120 

CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
I I i I I I I I I I I I I t i I I I M I I M I I i: till I I I I I I I i I t I I M I I I I I I I i I I 
CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 
70 80 90 100 110 120 

130 140 150 160 170 180 

FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
I t I I I 1 I I I I i I I I t I : I I : I I [ I I I I I I 1 I I I i I I M : I I t I I : I I I I I I I I i I 1 I I I 
FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 
130 140 150 160 170 180 

190 200 210 220 230 240 

RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I I I I I I I I I I I I I : I I t I I I I : 1 I I I I I I I I I i I I I I I t t I I I I t I I I I I I I t I I t I I I 
RESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
190 200 210 220 230 240 

250 260 270 280 290 

FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I I I I I I t I I I t i t I I I I I I t I I I I I I I I I I t I i I I I I t I t I I I I I I 1 I I I 
FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 
250 260 270 280 290 



orf 105-1 .pep 
orf 105ng-l 

orfl05-l.pep 
orfl05ng-l 

orf 105-1 .pep 
orfl05ng-l 

orf 105-1 -pep 
orf 105ng-l 

orf 105-1 .pep 
orf 105ng-l 
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Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888 |TNR3_SCHP0 THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi 1 1076928 Ipir I I S52350 thiamin pyrophospho kinase (EC 2.7.6.2) - fission yeast 
{Schizosaccharomyces pombe) >gi 1666111 (X84417) thiamin pyrophospho )cinase 
[Schizosaccharomyces pombe] >gi i 2330852 | gnl I PID | e334056 (Z98533) thiamin 
pyrophospho)cinase [Schizosaccharomyces pombe] Length = 569 
Score = 105 bits (259), Expect = 4e-22 

Identities = 64/192 (33%), Positives - 94/192 (48%), Gaps = 3/192 (1%) 

Query: 268 NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW— HFWI 441 

N G+ WRNE + -f P+ +ER F FG LS VH + + W+ 

Sbjct: 96 NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 155 

Query: 442 GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 621 

RRSP K P LDN GG++ G-*- + +E SEEA LD + LI P + ++ 

Sbjct: 156 PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 214 

Query: 622 PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 798 

R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 
Sbjct: 215 MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 274 

Query: 799 LDAFYRYGLIDAAHP 843 

LD R+G+I HP 
Sbjct: 275 LDFLIRHGIITPQHP 289 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from Kmeningitidis and Kgonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 49 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
41 9>: 

1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

251 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

401 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 

451 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC .CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORF from Kmeningitidis (strain A) 

ORF107 shows 97.8% identity over a 186aa overlap with an ORF (ORF107a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 
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orf 107 . pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
I I I I I I I I I I t i I M I t i I I I I I I I I I I I I I t I I M I I I t t I I I I I I I I I I 1 t I I 1 M I I 
orf 107a MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 107. pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 
I I I I I I I t I t I I I I I I I I I I I I I I i I III I I i I I I I I 1 II M I I I I M i I I I I II I 
orf 107a TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 107 . pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
t I I I It II II I II I I II II I I II I I I I I I I I I I M 1 II II I I I II I t I I II t I II i II I I 
orf 107a EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

130 140 150 160 170 180 

189 

orf 107 . pep KYRFLSXQX 
I I I I I I 

orf 107a KYRFLSANDAVPKQEMMNVKAELLEQKAKLDAYRREEVGLLQEIRTQNLTLXSLPQAAX 
190 200 210 220 230 

The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGAATAGAC 
CCAAACCAGC 
CCCTATGGAC 
TTGATATTTG 
ACCTGCATCG 
CNGCGAAATT 
TTTGCGCTTT 
GTTGAAAACG 
GTCGTCTGAA 
GTCGAACGTT 
TCAGAAAAGG 
TCCTATCCGC 
GCAGAGCTTT 
AGTCGGGCTG 
TCCCCCAAGC 



CCAAGCAACC 
CTGACGGGTA 
GACATTTGCA 
GTAACTATAC 
GGCGTTVATCA 
CNTGGAAGAT 
CGACCTCACG 
GAGGCAGTTT 
GCTGATACAC 
TGGAAAACCA 
CGCATTAGAC 
CAATGATGCA 
TAGAGCAGAA 
CTTCAGGAAA 
GGCATGA 



NTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGAGAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 
GTGCCAAAAC 
AGCCAAACTT 
TCCGCACGCA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGGATACG 
TTAAGGCTGG 
GGAGATAGCG 
GTTGGCAGAA 
CGCGCAGCCT 
ATTTCGCAAC 
AATGTTGCAG 
AAGAAATGAT 
GATGCCTACC 
GAATCTGACA 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
GGGACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 
GAATGTCAAG 
GCCGAGAAGA 
TTGGNNAGCC 



This encodes a protein having amino acid sequence <SEQ ID 422>: 



1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TVEGQILPAS GVIRVYAPDT GTITTVKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 



Homology with a predicted ORF from N.2onorrhoeae 

ORF107 shows 95.7% identity over a 188aa overlap with a predicted ORF (ORF107.ng) from K 
gonorrhoeae: 



orf 107 .pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

I II I M I I I I I I I I : I I II i I I I II I II I I I M 11 I I II I I It II I I I I I I I M II I I II 

orfl07ng MNRPKQPFFRPEVAIARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

orf 107 . pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

I : I I I I M I II I 11 II II t t I I I It I I I II II I II I M M I M M I I M I I I I I I I t I 

orfl07ng TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVECAGDKLFALSTSRFGAGGSVQQQLKT 120 

orf 107 . pep E AVLKKTLAEQELGRLKL I HGNETRSLKAT VERLENQELH I SQQIDGQKRR I RLAEEMLQ 180 

I I I t I II I II II t I tl I I I I II II I I I II I I I I I II : I I I I I I 1 I I I I I II II I I I I I : 

orfl07ng EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 180 



orfl07.pep 
orfl07ng 



KYRFLSXQ 188 
I I I M I I 
KYRFLSAQ 188 
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The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR PCYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from Kmeningitidis and N.gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 50 



The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
425>: 



1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT.TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 



1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 



1 ATGCTGAATA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGC/WA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GTAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 



1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from Ksonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from K 
gonorrhoeae: 

orf 108 . pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAIAGLDLGQSSE 60 
S M : i t I i t I 1) 1 I I I I I I I I I 1 I I : 1 I I I I I I M t I I I I i I I i I I I I I I I I 

orfl08ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orf 108 . pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
IIIIIMMIIIIiltlllll::ll llll:itlll llilllt I I : I : I I I I I I I I I I 
1 0 orf 108ng GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orf 108 . pep LFAKLVGKIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

llltilltllllllllllM:||:ltl)lllliltllltllllMt|[|||ltilll)lll 
orflOSng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 



15 



ORF108-1 shows 92.3% identity with ORFlOSng over the same 181 aa overlap: 



20 



orf 108-1. 


.pep 


orflOSng- 


-1 


orfl08-l. 


pep 


orfl08ng- 


-1 


orfl08-l. 


pep 


orflOSng- 


-1 



MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAIAGLDLGQSSE 60 
111 I i 1 I I t I t I I I I I I I I I I t I I I I I I I : I t I I ) i I I I I I I I I I I I t I M I I I I i I 
MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
It I I I 1 M t I I I I i I I I I I I I :: I I 1111:11111 1 I I II I 1 M : I : II M II I I II 
GKTNDGKKQI S YPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 



25 orf 108-1. pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

M I I I I I I II I I I I I I I I I I : I I : I I I It I I t II I I I I 1 II I I I I t I I I 1 t I I I I I I I I I I 
LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORFlOSng nucleotide sequence <SEQ ID 429> is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

30 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

201 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA Aacgccgtcc 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

35 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CTVATATCGCC GAAGACGGCG 

401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

40 This encodes a protem having amino acid sequence <SEQ ID 430>: 

1 MLKIPF AVLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLAL GOSSE GKT NDGKKOI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
Upid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



50 Example 51 



The following DNA sequence was identified in ^meningitidis <SEQ ID 43 1>: 
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1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

451 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIAA ASFVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 433>: 

1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGC7\AAGCC AGAATGTCTT 

401 TTTTTCTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWECKGLPI AA ASFVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGAN LGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 109 . pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I I I I I t t I I I I I I I I I I I I I I M I I i I I t I I I I i I I t I M I I M I I I t M I i I I I I I I I 
orf 109a MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 109 . pep TVS FARKGLIDWKKGLP I AAASFVGGVAGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 
I I I I I I I M t I I I I It M I I I I I :[ I I : I t I I I I i I I t I I I I I M I M I I I I I I I I I I I I 
orf 109a TVS FARKGLIDWKKGLP I AAASFAGGWGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 109 . pep KLDGSKEGECARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

I I I I I N I t I I I I I I It I I I I : I I 
orf 109a KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

The complete length ORF109a nucleotide sequence <SEQ ID 435> is: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGTGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCGGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT CGGTCTGACG GTTGCACCAC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This encodes a protein having amino acid sequence <SEQ ID 436>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGAN LGA RFAVRFGSKL I KPLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109a and ORF109-1 show 99.2% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 10 9a. pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I t I I I t I I ) I I t I I M I I I t I I I I t I I I I I I I I I I I I M I 1 t I I I I I I t I t I I I I [ I I I 
orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10 9a. pep TVSFARKGLIDWKKGLP I AAASFAGGWGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 
M I I I I t I t I I I I I I I I I I I I I t : i I I : I I I t M t I I I I I I I I I I n I M i I I I I I It I I 
orf 109-1 TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 109a . pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
i I I I I I I I t M I t I I I I I I I M I I I I I I i I I I I I I I I I i I I I I I I I I I It I I I I I I I I I I 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 109a . pep LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
MIIMllllllllllilMllllilillllllllllllllllilllllllltlllltil 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 

250 260 
orf 109a. pep SMAVKLLIDERNPLYQMIVSMFX 
I I I M I I I t M i I I I t I I I I 11 I 
orf 109-1 SMAVKLLIDERNPLYQMIVSMFX 

250 260 
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Homology with a predicted ORF from N.gonorrhoeae 

ORF109 shows 98.3% identity over a 231aa overlap with a predicted ORF (ORF109.ng) from N. 
gonorrhoeae: 

orf 109 .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

I I I I I I ) I I I I I I I I M I I I I M I I I I I M I I t I I I I I I I I I M I t I I I I I I t I M I I M 
orfl09ng MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109 .pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

I M I I I I t I I t I I I I I I I I t I I I : I I I : i I I I i I I I I I I I M t I I I I I I t I I I M I I I I I 
orfl09ng TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

orf 109 . pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

I I I I I I I I I I I I I I I I I I I I I t I I I I I t 1 I I I I I I I i I I I I I tl I M i I I I t I I I I I I I 
orfl09ng KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109 .pep IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

I 1 M I I I I I i I I i i I I I I I I I I t I I I t I M I I I I I I t I I I t t I I I I I M I 
orfl09ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 

An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
acid sequence <SEQ ID 438>: 

1 MEDLYIILAL GLVAMIAGFI DAIAG6GGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 439>: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATCGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 TTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTATT CGGGCTGACG GTTGCACCGC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCTTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGTGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIVATMAV GAFVGAN LGA RFAVRFGSKL I KPLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109ng-l and ORF109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 109ng-l .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I I i I I i I I I I I I M I I I I I I I I t I I I I I I I I I I M I I I I I I I I I I t i I t I I I ) I I I I 1 I 
orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 109ng-l -pep TVS FARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 
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t I I M M I I I I I 1 I I I I I I I I I I : I I I : I M I I I I [ I I I t I I I i I I I I I I I I I I I t I I I I 
orf 109-1 TVSFTUIKGLIDWKKGLPIAAASEVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 109ng-l . pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I I I I I I t I I I I I I i I I I I I I I I I I I I t I I I M I I I I i I I I I I i I I I I I I t I t I I I I I I I 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 109ng-l . pep LANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
lltMlillllllMltllillthltllllllllllf ttllllMtMlllltlllItt 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 

1 1 1 1 1 1 1 i n I I i I t M I I t I I I 

orfl09-l SMAVKLLI DERNPLYQMIVSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

sptP29942|YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3*REGI0N (0RF9) 
>gi I 94984 Ipir I 1 138164 hypothetical protein 9 - Pseudomonas sp >gi|551929 
(M62866) 0RF9 [Pseudomonas denitrif icans] Length - 261 
Score = 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives = 131/214 (60%), Gaps = 1/214 (0%) 

Query: 41 PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

PP+ + TNKLQ R+G ++ K+ LP+ D+ 

Sbjct: 43 PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 102 

Query: 101 LLAWPVLLIFVALYFVFSPKLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFF 160 

L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query: 161 LIAFIVLLGCKLLNAMSYTKLANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGA 220 

++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
Sbjct: 162 MLGFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 221 

Query: 221 RFAVRFGSKLIKPLLIVI SI SMAVKLLI DERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKIIKPLLVIVSIALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from Kmeningitidis and N.gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 441>: 

1 . . CTGCTAGGGI ATTGCATCGG TTATCGGTAC GgCTGTTGCA GCAAAACCAG 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG.ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

501 GGTCGGATTG TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC.AAGC 
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551 CCGAAAGTAT .TTTGGGTGC gTCCAATCTC TCATTTAGGG GCAACGTCAA 
601 TATTTCCG.A GGGGCAGAgT GCGGATGTGG TTTTCCTGA 

This corresponds to the amino acid sequence <SEQ ID 442; ORFl 10>: 

1 ..LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

5 51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with ORF88a from Kmeninsitidis (strain A) 

ORFl 10 shows 91.5% identity over a 188aa overlap with ORF88a from strain A ofK meningitidis: 



15 



20 



25 



30 



35 



10 20 30 40 50 60 

orf 88a . pep MSKSRRSPPLLSRPWFAFFSSMRF AVALLSLLGI AS V I GTVL QQNQPQTD YLVKFGSFWA 

i I i I M I I I t : t I I I I I I I I [ i I I I I I I I I 
orf 1 10 LLGIASVIGTLL QQNQPQTDYLVKFGSFWA 

10 20 30 

70 80 90 100 110 120 

orf 88a . pep QIFGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 i 1 1 1 i 1 1 1 1 M 1 1 M I n I I I I [ I I i I I I i I I I t 

or f 1 1 0 XI FGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 
IIIIIIIIIIIIIIIMMIIIIlltlilllliiMllllllllliliflllllllllll 
orf 110 SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKW GYIFAHVALIVICL 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf 88a . pep GGLI DSNLLLKLGMLTGR I VPDNQAVYAKDFKPES ILGASNLS FRGN VN I SEGQS AD W F 

I I I I t i 1 I I I I I I I I I i M : : : I i [ t : I 
orf 110 6GLI PSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 

160 170 180 190 200 210 

250 260 270 280 290 300 

orf 88a . pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 



orfllO SX 

40 However, ORF88 and ORFl 10 do not align, because they represent two different fragments of the 
same protein. 

Homology with a predicted ORF from N. gonorrhoeae 

ORFl 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORFl lO.ng) from N, 



45 



50 



55 



gonorrhoeae: 

orf 110. pep 
orf llOng 
orf 110. pep 
orf llOng 
orf 110 . pep 
orf llOng 



LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 
I I I I I I I I I t : I I I I I I I i I I i i M i II: 
MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 

XIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 

It 1 1 1 i 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I II I t II t t I II I II I I M I I t I t I 

RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 



30 



60 



90 



120 



150 



SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 
II i I I I t I I I I I I I II I II : I II I I I :: I i'l I I I I I I i I I I M 1 I I I II I I I I I ( I II t 
SSLLDVKIAPEVAPCRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 
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orfllO.pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

t II: i I I I I I M I : I III: M Mil I II I : I I I I II I I I I I I I t : I I i I I 
orfllOng GRLINXNLLLKLGMUVGSIFRNNRRVMPRISKPESIWGGVQSLIKGQRQYFQRGKVRMWF 240 

orfllO.pep S 211 
I 

orfllOng S 241 

The complete length ORFllOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ID 444>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGSIF 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N.meningitidis and Kgonorrhoeae^ and their epitopes, could be usefiil antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 53 

The following DNA sequence was identified in Kmeningitidis <SEQ ID 445>: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

701 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This corresponds to the amino acid sequence <SEQ ID 446; ORFl 1 1>: 



1 MPSETRLPNF IRVLIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 

301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORFl 1 1 shows 96.9% identity over a 35 laa overlap with an ORF (ORFl 1 la) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 111a . pep MPSETRLPNFIRTLIFALSFIFIJ^ACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 
t I I M I I 1! M t: I I I I I : I i I I I I I I I I I I I I 1 I I i I I M I I I t I t I I t t I I I i i I t I 
O r f 1 1 1 MPSETRLPN FIRVLI FALGFI FLN ACSEQTAQTVTLQGETMGTT YTVKYLSNNRDKLPS P 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 111a . pep AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
I I I I I t I I I I I I I I I I I I I I I I I I I i I ) I I I t t I t I I I t I I I I I I t I I I I I I : I I M M 
orf 111 AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 111a . pep GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
I I I I it i I I I I I I I I i I 1 I I I I i I I I I I I I i I I I I i I I i I f I I I i I I I I I I I I I I I I I I I 
orf 111 GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 111a . pep AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
I I i M M i I I I I I I I I I M I I t I I I I i I I I I I I I I I I I I I I I I I I I t n I I I I I [ i t I 
orf 111 AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 111a . pep GGNTQIIVPLNNRSXATSGDYRIFHVDKSGKRLSHIINPNNKRPISHNLASISVXADSAM 
I I I I I It I I t I i I I I I I I I I I M I I I I : I I I I I I I I I I I I I I I I I t I t i I I I I i I I t I 
orf 111 GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 

250 260 270 280 290 300 

310 320 330 340 350 

orf 111a . pep TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
I I I I I I i i I i I I I I M M i N I I I I i t I I 1 I I I i I M I I I I I I I I i t i I I I 
orf 111 TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 

The complete length ORFl 1 la nucleotide sequence <SEQ ED 447> is: 



1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGTGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAAG CAGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATNANGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGNGAGTT 

651 GCACGGCAAA GNCATU^CG CGCGCGGCGA ACCTTGGCGC ATCGGCATCG 

701 AACAGCCCAA CATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGNTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAGCGGC AAACGCCTCT CCCATATCAT TAATCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGNTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTNTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 448>: 



1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
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101 ISSDFAHVTA EAVHLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 

201 LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 

301 TAEX;XSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 



Homolopv with a predicted ORF from N.mnorrhoeae 

ORFl 1 1 shows 96.6% identity over a 351aa overlap with a predicted ORF (ORFl 1 1 .ng) from N. 
gonorrhoeae: 



10 20 30 40 50 60 

orflllng MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
M I I I I I I t : I I : M I I I I i I I I t I I I I I I I I I I I I I I I I I I M I I I I I 1 I i I I I I t t I I 
orflll MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

10 20 30 40 50 60 



70 BO 90 100 110 120 

orflll AKIQKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
I : I t I I I t I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I t I I I I I i I I I I i I I f t I I I 
orflll AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orflllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPK 
I t I I t I I I i t t i I I I I I I i t M I I i I I I I I I I I I t I I I I I I I I I I : 1 I I I I I I M t It I i 
orflll GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orflllng AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 
I I I I t I I I M I I I I M I i I I I I I I I I [ I I I t I I i i I I I I I M I I : t i I I I I I t I I I I I : I 
orflll AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orflllng GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 
I I t I I i I t I I i M I i I I I I I I I I I I I I I 11 I I I I I I I I t I i I i I I I I I I I I I I I I : I I I I 
orflll GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 

250 260 270 280 290 300 



310 320 330 340 350 

TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
i i I t I I [ t I I I I I I t I n : I i I : I I I I M M I I I I I I I t I I t t I I Mill 
TADGLSTGLFVLGETE7VLKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
310 320 330 340 350 

The complete length ORFl 11 ng nucleotide sequence <SEQ ED 449> is: 



orflllng 
orflll 



1 ATGCCGTCTG AAACACGCCT 

51 CCTGGGTTTC ATCTTCCTGA 

101 TTACCCTGCA AGGCGAAAcg 

151 TCAAATAATC GGGACAAACT 

201 TGATGATGCG CTTAAAGAAG 

251 ATTCCGAAAT CAGCCGGTTC 

301 ATTTCAAGCG ATTTCGCACA 

351 CCTGACTCAC GGCGCACTGG 

401 GGGGGTTCGG CCCCGACAAA 

451 ATCAAACAGG CGGCATCTTA 

501 AGGCAAAGAT TACGCTTCCT 

551 ATTTATCTTC GATTGCCAAA 

601 CTGGAAAAAT ACGGCATTCA 

651 GCACGGCAAA GGCAAAAATG 

701 AGCAACCCAA TATCATCCAA 

751 aaCaaccgtt cgctTGCCAC 

801 TAAAAAcggc aaacgccttt 

851 ccATCAGcca caacctcgcc 

901 ACGGCGGACG GTTtatCCAC 

951 CTTAAGGCTG GCAGAACAAG 



GCCGAACCTT ATCCGCGCCT TGATATTTGC 
ACGCCTGTTC GGaacaaacC GCGCAaaccg 
aTGGGTACGA CCTATACCGT CAAATACCTT 
CCCCTCCCCT GCCAAAATAC AAAAGCGCAT 
TCAACCGGCA GATGTCCACC TACCAGACCG 
AACCAACACA CAGCCGGCAA GCCCCTCCGC 
CGTTACCGCC GAAGCCGTCC GCCTGAACCG 
ACGTAACCGT CGGCCCTTTG GTCAACCTTT 
TCCGTTACCC GTGAACCGTC GCCGGAACAA 
TACGGGCATA GACAAAATCA TTTTGCAACA 
TGAGCAAAAC CCACCCCAAA GCCTATTTGG 
GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 
AAATTATCTG GTCGAAAtcg gcggcGAGTT 
CGCACGGCGA ACCGTGGCGC ATCGGTATAG 
GgcgGCAata CGCAGATTAt cgtcccgctg 
TTCCGGCGAT TAccgtaTTT tccacgtcgA 
cccacaTCAT CAATCCCaAC aacAAACgac 
tccatcagcg tggtctcAGA CAGTGCAATG 
AGGATTATTT GTTTTAGGCG AAACCG/UVGC 
AAAAACTCGC TGTTTTCCTA ATTGTCCGGG 
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1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 
1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDECVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotem precursor from H.influenzae: 

sp|P44550|YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi | 1074292 | pir | 4 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573128 CU32702) hypothetical [Haemophilus influenzae] Length = 346 
Score = 353 bits (896), Expect = 9e-97 

Identities = 181/344 (52%), Positives - 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 

+ LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLISGIIAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query: 67 IDDALKEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 

1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T G7U.DV 
Sbjct: 59 lEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
SSIAKGFGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I lE+P + 

Sbjct: 17 9 SSIAKGFGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAXEKPTTTGERAVE 238 
Query: 246 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 
Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 349 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N.meningitidis and Kgonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 54 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 451>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGC(5GC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCkA yTGGCAATCG GCGTGATGGG 

201 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

251 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 

301 TTGCGCGATA AACAAACGC;G TgCGTATTTG GACGGCTGGT TGCAATACCA 

351 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

401 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

451 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

701 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
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751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA. . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 



1 . . PCRRQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 

51 VFVRQNEGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 

5 101 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 

151 EGIVGKGNNV RFYLQPQAQF TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 

201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAGRTALEGR 

251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with putative secreted VirG-homolgue of N. meninsitidis (accession number A32247) 
ORF and virg-h protein show 51% aa identity in 261aa overlap: 



15 



20 



0rf35 5 QGDDVYAAHASRQKLWLRFIGGRSHQNIRGGAA-ADGWRKGVQIGGEVFVRQNEGSXLAI 63 

+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virg-h 396 KNSDIFDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNESNQLSI 455 

Orf35 64 GVMGGRAGQHASVNGKG—GAAGSDLYGYGGGVYAAWHQLRDKQTGAYLDGWLQYQRFKH 121 

G+MGG+A Q ++ + ++ G+G GVYA WHQL+DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRSTFHNPDTDNLTTGNVKGFGAGVYATWHQLQDKQTGAYADSWMQYQRFRH 515 

Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 181 

RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHrTKKGNSLRVYLQPQAQLTYLGVNGKFSD 575 



25 



Orf35 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 241 

SE V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 
virg-h 576 SENAHVNLLGSRQLQTRVGVQAKAQFSLYKNIAIEPFAAVNALYHNKPFGVEMDGERRVI 635 



30 



Orf35 242 AGRTALEGRFGIEAGWKGHMS 262 

+TA+E + G+ K H++ 
virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 



Homology with a predicted ORF from Kmeningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from strain A ofN. 
meningitidis: 

35 10 20 30 

orf 35 .pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 

: I I I I I I I t I I I I I [ I I I I I t M I I I t I 
orf 35a QRLAIPEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 
310 320 330 340 350 360 

40 

40 50 60 70 80 90 

orf 35 . pep GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 
Mini I I I I I II I I i I II I I I II I I I I I I 1 I II II I I II I I I I I I II i I : II i III 
orf 35a GAAADGRRKGVQIGGEVFVRQNEGSRLAIGVMGGRAGQHTVSVNGKGGAAGSYLHGYGGGV 
45 370 380 390 400 410 420 



100 110 120 130 140 150 

orf 35 . pep YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 
t I I I I I M I I I I I I I I 1 I I I i I I I I I I I I II I II t II I I t t I I I I I II I M I I I I II t : I 
50 orf 35a YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGW 

430 440 450 460 470 480 



160 170 180 190 200 210 

orf 3 5. pep GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAPCTRFALRNGVN 
55 t t II t t II t I I II t i I t II M I I I I I I t I II M II I I I I I I I II I I II I I I I I II II 11 I 

orf 35a GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
490 500 510 520 530 540 



60 



orf 35. pep 



220 230 240 250 260 

LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 
I t I II I I I I I I It II I I II II I I I I II t I I I I I II I M I I I I I 1 I I I I I 
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orf35a ' LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSARIGYGKRTDGD 
550 560 570 580 590 600 

orf35a KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTC7VAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCA/^TT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

201 TAATATGCCC GTTGTTAAGA AATATATTAC AGATACTTAC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

401 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

451 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

701 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

1201 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACA/VACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

1401 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

1451 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

1801 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 

1851 GCTGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 454>: 

1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT DQNSSEYGYD 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK QLQDLYKTRP 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS VLTPHSNTSQ 

151 TSLNNIFNKK LHVKIENKSH VAGQVLELTK MTLKDSLWEP RRHSDIHMLE 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP ALTFEEKVSG 

251 QSGWLERRP ENLKTLDGRK LIAAEKADSN SFAFKQNYRQ GLYELLLKQC 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG DDVYAADPSR 

351 QKLWLRFIGG RSHQNIRGGA AADGRRKGVQ IGGEVFVRQN EGSRLAIGVM 

401 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT GAYLDGWLQY 

451 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK GNNVRFYLQP 

501 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR FALRNGVNLQ 

551 PFAAFNVLHR SKSFGVEMDG EKQTLAGRTA LEGRFGIEAG WKGHMSARIG 

601 YGKRTDGDKE AALSLKWLF* 

Homology with a predicted ORF from N, gonorrhoeae 

ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from K 
gonorrhoeae: 



orf 35 .pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 34 

:::!:: I : I I I I I I : I : I : : I 

orf35ngh FTKVQERDDIAIYAQQAQAANTLFALRLNDKNSDIFDRTLPRKGLWLRVIDGHSNQWVQG 370 
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orf 35 . pep GAA-ADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKG — GAAGSDLYGYG 91 

: I : : I : M I I t : i I I I ) : III:: I : I I : i I I : I t : : : : : : : : : I : I 
orf35ngh KTAPVEGYRKGVQLGGEVFTWQNESNQLSIGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 430 

orf 35 . pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENE^AERYKTKGWTASVEGGYNALVAE 151 

: II I I : I II i : M I I M t : I : I : I I I t I : I I I I I : I I : : M I II : I : I I I II : t I 
orf 35ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

orf 35 . pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

: : I I I : : i I I I I I 1 I : i I I II I I I : t II : : I : I I t I I I N : I : : I) : : II : I 
orf35ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35 . pep GVNLQPFAAFNVLHRSKSFGVEMDGEKC3TLAGRTALEGRFGIEAGWKGHMSA 263 

M::lll:l I ::::! IIII:II::::: ::|::| ::|: I |:|:: 
orf35ngh GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 

1 ..KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPECHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

401 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins ftom Kmeningitidis and Kgonorrhoeaey and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 457>: 

1 . . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGA7VA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TATCAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ E) 458; ORF46>: 

1 ..AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 

51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 

101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 

1 . . GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAACTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

201 GGCGGCCATT AAAGGAAATA TCGGCTACAT TGTCCGCTTT TCCGATCACG 

251 GGCACGAAGT CCATTCCCCs TTCGACAACC ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC GAACACCATC CCGCCGACGG CTATGACGGG CCACAGGGCG 

401 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 
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451 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 



1 . . AVCLPMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.^onorrhoeae 

ORF46 shows 98.2% identity over a 1 1 laa overlap with a predicted ORF (ORF46ng) from N. 
gonorrhoeae: 

orf46.pep AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 45 

I I M I I I i I I I I I t I I I I M M 1 i I i I M I 
orf 4 6ng PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 217 



orf 4 6 , pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

i I i I I t I M M I I I i I I i I I I I t I I I I I I I I I I I I I I t I I t I I t ! I i : 1 M t I I I t I t I I 
orf46ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDTNKKGAVTGGHSLTRGDV 277 

orf 4 6. pep RVIQQTSAPDKHGXLSSDSGN 126 

1 1 i 1 1 i i 1 1 1 1 1 1 1 1 n I i I 

orf4 6ng RVIQQTSAPDKHGVLSSDSGN 298 

A partial ORF46ng nucleotide sequence <SEQ ID 461 > is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 



1 ..RRLKHCCHAR LGSAFHRKQD GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 

51 RTRHRSRQQY LYGSHPHQRD WSCPGKIQLG RHHGTSCRAV ADXRDRICER 

101 EIRRQRQXCR CRLGKIPSLS IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 

151 KLADQRHPKT GVPFDGKGFP NFEKHVKYDT KLDIQELSGG GIPKAKPVFD 

201 AKPRWEVDRK LNKLTTREQV EKNVQETRRR SQSSQFKAHA QREWENKTGL 

251 DFNHFIGGDI NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 



1 TTGGGCATTT CCCGCAAAAT 

51 CCTGCCGATG CATGCACACG 

101 GgCaggttcT CGaccGTCAG 

151 TTcggCaGCA GGGGGGAGCT 

201 aaacaTAcaa Agccatcagt 

251 ttgaaggaaA TAtcgGctac 

301 ttccattcgc ccttcGAcaa 

351 CGGTAGTCCC GTTGACGGAT 

401 ACGAACACCA TCCCGCCGAC 

451 CCCGCTCCCA AAGGCGCGAG 

501 TGCCCAAAAT ATCCGCCTCA 

551 GGCTTGCCGA CCGTTTCCAC 

601 GGCGACGGAT TCAAACGCGC 

651 GGGCAATGCc gccGAAGCCT 

701 TCATCGGCGC GGCAGGAGAA 

751 ATAAGCGAAG GCTCAAACAT 

801 CACCGAAAAC AAGATGGCGC 

851 TCAAAGACTA TGCCGCAGCA 

901 AATGCCGCAC AAGGCATAGA 

951 CCCCATCAAA GGGATTGGAG 

1001 TCACGGCACA TCCTGTCAAG 

1051 AAAGGGAAAT CCGCCGTCAG 

1101 ATACCCGTCC CCTTACCATT 



ATCCCTTATT CTGTCCATAC TGGCAGTGTG 
CCTCAGATTT GGcaAACGAT CCCTTTATCC 
CATTTCGaac ccgacggGAa ATACCaCCTA 
TgccnagcGC aacggccATa tcggattggG 
tGggccacct gatgattcaa caggcggccg 
attgtccgct tttccgatca cgggcacaaa 
ccaTGCCTCA CATTCCGATT CTGACGAAGC 
TCAGCCTTTA CCGCATCCAT TGGGACGGAT 
GGCTATGACG GGCCACAGGG CGGCGGCTAT 
GGATATATAC AGCTACGACA TAAAAGGCGT 
ACCTGACCGA CAACCGCAGC ACCGGACAAC 
AATGCCGGCG CTATGCTGAC GCAAGGAGTA 
CACCCGATAC AGCCCCGAGC TGGACAGATC 
TCAACGGCAC TGCAGATATC GTCAAAAACA 
ATTGTCGGCG CAGGCGATGC CGTGCagGGT 
TGCTGTCATG CACGGCTTGG GTCTGCTTTC 
GCATCAACGA TTTGGCAGAT ATGGCGCAAC 
GCCATCCGCG ATTGGGCAGT CCAAAACCCC 
AGCCGTCAGC AATATCTTTA TGGCAGCCAT 
CTGTCCGGGG AAAATACGGC TTGGGCGGCA 
CGGTCGCAGA TGGGCGCGAT CGCATTGCCG 
CGACAATTTT GCCGATGCGG CATACGCCAA 
CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 
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1151 GTTACGGCAA AGAAAACATG AGCTCCTCAA CCGTGGCGCC GTC7VAACGGC 

1201 AAAAATGTCA AACTGGCAGA CCAACGCCAC CCGAAGACAG GCGTACCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAGAA GCACGTGAAA TATGATACGA 

1301 AGCTCGATAT TCAAGAATTA TCGGGGGGCG GTATACCTAA GGCTAAGCCT 

1351 GTGTTTGATG CGAAACCGAG ATGGGAGGTT GATAGGAAGC TTAATAAATT 

1401 GACAACTCGT GAGCAGGTGG AGAAAAATGT TCAGGAAACG AGAAGAAGGA 

1451 GTCAGAGTAG TCAGTTTAAA GCCCATGCGC AACGAGAATG GGAAAATAAA 

1501 ACAGGGTTAG ATTTTAATCA TTTTATAGGT GGTGATATCA ATAAGAAAGG 

1551 CACAGTAACA GGAGGGCATA GTCTAACCCG TGGTGATGTA CGGGTGATAC 

1601 AACAAACCTC GGCACCTGAT AAACATGGGG TTTATCAAGC GACAGTGGAA 

1651 ATTAAAAAGC CTGATGGAAG TTGGGAGGTG AAAACGAAAA AAGGTGGGAA 

1701 AGTGATGACC AAGCACACCA TGTTCCCAAA AGATTGGGAT GAGGCTAGAA 

1751 TTAGGGCTGA AGTTACTTCG GCTTGGGAAA GTAGAATAAT GCTTAAGGAT 

1801 AATAAATGGC AGGGTACAAG TAAATCGGGT ATTAAAATAG AAGGATTTAC 

1851 CGAACCTAAT AGAACAGCAT ATCCCATTTA TGAATAG 

This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND PFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAXR NGHIGLGNIQ SHQLGHLMIQ QAAVEGNIGY IVRFSDHGHK 

101 FHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLADRFH NAGAMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFMAAIPIK GIGAVRGKYG LGGITAHPVK RSQMGAIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLADQRH PKTGVPFDGK GFPNFEKHVK YDTKLDIQEL SGGGIPKAKP 

451 VFDAKPRWEV DRKLNKLTTR EQVEKNVQET RRRSQSSQFK AHAQREWENK 

501 TGLDFNHFIG GDINKKGTVT GGHSLTRGDV RVIQQTSAPD KHGVYQATVE 

551 IKKPDGSWEV KTKKGGKVMT KHTMFPKDWD EARIRAEVTS AWESRIMLKD 

601 NKWQGTSKSG IKIEGFTEPN RTAYPIYE* 

ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 



10 20 30 40 

orf 4 6-1 . pep AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

t ) I I I I I I I I I MM M M M M M M M M t M M M M M I 
orf 46ng-l LGISEIKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 



50 60 70 80 90 100 

orf 4 6-1 . pep QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
: : M M I : M M M t : M M M t : : M M M M M I M I : I M M M M M M M M M 
orf46ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 



110 120 130 140 150 160 

or f 4 6-1 . pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDI YSYDIKGVAQNIRLNLTDNRS 
M M M M M M M M M M i M M M M M M M M M M M M M I M M M M M M 
or f 4 6ng- 1 VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARD I YSYDIKGVAQNIRLNLTDNRS 

130 140 150 160 170 180 



170 180 190 200 210 220 

orf 4 6-1 . pep TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
I M M M M M M : M M t M I M M M M I M M M M M M M M M M I M M M N 
orf4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
190 200 210 220 230 240 



orf 46-1. pep I 
I 

orf4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
250 260 270 280 290 300 

Homology with a predicted ORF from Kmenin^itidis (strain A) 

ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N, meningitidis: 

10 20 30 40 50 60 
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or'f 4 6a . pep LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
I I I I I I I I I I I I I I I I I I I I t i I I i I M I t I I I I I i I M I t I I M t i I I I I I i I I I t I 
orf4 6ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 4 6a . pep SGHIGLGNIQSHQLGNLFICX3AAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
; I I I I I I I i I I I I 1 [: I : 1 I I I I :: I I I I t I I I I I I I 1 I : It I I I I I I I I I I M I I I t ) 
orf46ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 4 6a . pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
M I I I 1 I I i I I I I i I I I I I I I I I M I I i I I I I I I I I I I I I M I I I I I I I I i I I I I I i M t 
orf4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 4 6a . pep TGQRLVDRFHNTGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
|||t|:ltlM:|:lllllll)lltiMillllllMlllillMllllllltllllllt 
orf4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 4 6a . pep IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
I I I i I I M I I I i 1 I M I I i t I ) i M I I I t I t I I I I i I I t I I I I I I I i I i 1) I I M I i I I I 
orf4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 6a . pep NAAQGIEAVSNIFTAVIPVKGIGAVRGKYGLGGITAHPVKRSQMGEIALPKGKSAVSDNF 
I I I I t i I I I M I i I : I I : I I M 1 I I I M i I I I I t i I I I 1 M t I I I I i M I I t i I I I I I 
orf46ng-l NAAQGIEAVSNIFMAAIPIKGIGAVRGKYGLGGITAHPVKRSQMGAIALPKGKSAVSDNF 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 4 6a . pep ADAAYAJCYPSPYHSRNIRSNLEQRYGPCENITSSTVPPSNGKNVKLANKRHPKTKVPFDGK 
I I I I It t I I t I I I I 1 i I I I t I t I I I I I It M i i i I t I I I I I I I I t I : : t I I I I I I II II 
orf46ng-l ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 

370 380 390 400 410 420 



430 440 450 460 470 

orf 4 6a. pep GFPNFEKDVKYDTRINTAVPQVN PIDEPVFN — PKGSVGSAHSWSITARIQYAKLP 

II I I I I I I t I II : : : : :: : i : I I I : I : I : : : I : I I I 
orf46ng-l GFPNFEKHVKYDTKLD — IQELSGGGIPKAKPVFDAKPRWEVDRKLN-KLTTREQVEKNV 

430 440 450 460 470 



480 490 500 510 520 530 

orf 46a. pep RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 
:: I t 

orf46ng-l QETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDVRVIQQTS 
480 490 500 510 520 530 

The complete length ORF46a DNA sequence <SEQ ID 465> is: 



1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGCAAACGAT TCTTTTATCC 

101 GGCAGGTTCT CGACCGTCAG CATTTCGAAC CCGACGGGAA ATACCACCTA 

151 TTCGGCAGCA GGGGGGAACT TGCCGAGCGC AGCGGTCATA TCGGATTGGG 

201 AAACATACAA AGCCATCAGT TGGGCAACCT GTTCATCCAG CAGGCGGCCA 

251 TTAAAGGAAA TATCGGCTAC ATTGTCCGCT TTTCCGATCA CGGGCACGAA 

301 GTCCATTCCC CCTTCGACAA CCATGCCTCA CATTCCGATT CTGATG7VAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

401 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

451 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

551 GGCTTGTCGA CCGTTTCCAC AATACCGGTA GTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCC GCCGAAGCTT TCAACGGCAC TGCAGATATC GTCAAAAACA 

701 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCAGGGT 

751 ATAAGCGAAG GCTCAAACAT TGCTGTTATG CACGGCTTGG GTCTGCTTTC 

801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 
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851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC TVATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA T^GTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

1401 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

1451 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAAT;\ATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence <SEQ ID 466>: 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 
51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 
101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WEJGYEHHPAD GYDGPQGGGY 
151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 
201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 
251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 
301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 
351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 
401 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 
451 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 
501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 
551 GKITHK* 

Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N.meningitidis and Kgonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 56 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG... 

This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 
51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 



1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGAT7\ATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 
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401 CCGCCGCCAA AACCGACTTC CGGGACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATAR PIVN 

51 LDYLPAALLI ALPWREVKIA GVLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFI LTAP APY QIMTGLL LLYMLAMPFV L QKAAAKTDF RHIAVCAAW 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF48 shows 94.1% identity over a 1 19aa overlap with an ORF (ORF48a) from strain A of N. 



meningitidis: 

10 20 30 40 50 60 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATA RPIVNLDYLPAALLI 
I I I I i I I I I i I I M I t I I I I [ M t I t i i t I I I I I I I I I M I I I i I I I I I I I M 1 I I I I 
orf 48a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATA RPIVNLXYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 119 

or f 4 8 . pep ALPWRFVKIAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPY QIMTGL 
Mill III Mil I II I It i t II II M I I I I i I 1 II I I I I I t I t till t I It I 1 I 
orf 4 8a ALPWRXVKIX GVLAXWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFIX TAPALY QIMTGLL 
70 80 90 100 110 120 



orf 48a LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMANIFGANNFYYA 
130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ID 471 > is: 



1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 
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501 GATGGCCAAT ATCTTCGGCG CAAACAACTT 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA 

751 CTGGCGCAAA AAGANCGTTT TTCGGTTTGG 

801 CATCGGCGCG ACGATCGAAG GCGAAATGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG 

1101 ANTTTTCAAA AAACACGACA AGGGACTGTT 

1151 GCCACGCCGA CTATCCCGAA TCNGACATTT 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA 

1401 GAACTTCAAA ATCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 472>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATAR PIVN 

51 LXYLPAALLI ALPWRXVKIX G VLAXWLAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFIXTAP ALY QIMTGLL LLYMLAMPFV L QKAAAKTDF RHIAACAAW 

151 VAAGYFTGHL SXYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQEOCRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVXWLNFK IK* 

ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf48a.pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 
I I I M I I I i 1 I t I I I I I I I M I I I I I M I I I I I 1 I I I I I I I 1 I I I I I I t I I I I I I It I 
orf48-l MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 8a . pep ALPVmXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 
I I I I I til tilt I I t 1 I 1 t I 1 i I I I 1 1 t t II 1 II t I 1 I II 1 I I I t t I I I 1 I i t I t 
orf 48-1 ALPWRFVKIAGVLAFWLAVLFDGI^IMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 4 8a . pep LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMAN IFGANNFYYA 
1 I t I I i I I t [ i I 1 I t I I I I 1 I I I I : 1 1 I 1 t : I I t t 1 II I II t I t I i 1 1 I 1 I II 1 1 t I I I 
orf 48-1 LLYMLAMPFVLQKAAAKTDFRHIAVCAAVVAAAGYFTGHLSYYDRGRMANIFGANNFYYA 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 4 8a . pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
1 t t I I 1 I I I I 1 1 t I 1 1 I 1 I I I 1 1 I I 1 1 1 1 t 1 I 1 1 1 I It 1 1 I 1 1 1 1 1 1 I 1 I I I I I I t I 1 1 I 
orf 48-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 4 8a . pep ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 
t I 1 i M 1 I I 1 I It I 1 I I I I I t 1 1 I t I I 1 1 1 : t It 1 I I I I I I I 1 1 I I 1 I 1 1 I t I I 1 t I I t 
orf 4 8-1 ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 8a . pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
ttllltlltllllMIMIIIIIItltlltllltllllltllllllMltlllttltlll 
orf 48-1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 



CTATTACGCC 
ACTTTATTAC 
CAACAGCGTG 
CTTTATCGTC 
ACGCCACTTT 
GAAAGCGGCA 
CGAACTGTGT 
CCGACGAAAA 
TACGCCACCT 
CAGCTGGTAT 
TGATCGGTAA 
CTGTTCGGCG 
TTACTGGATG 
TCAACCACAG 
NTCTGCCGCA 
TTTGATCCAA 
ACCATCCGCC 
CAGGGGCACG 



AAAAGTCAGG 
CGCCGGCCTG 
CCGCCACGCA 
GCCGAATCTT 
TGCCAAACTG 
GTTTTCCCTT 
GCCTACGGCG 
ATTTGCCCGC 
TTGCGATGCA 
CCGAGGGCGG 
AAAAACCTGC 
AAGTGTCGGC 
ACGCTGACCA 
GCTCAAATGC 
ATTTCAGCCT 
CGCCCCGAAA 
GCCCGTCGGC 
TCGNCTGGCT 
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-370 380 390 400 410 420 

LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 
I M I i I I I I t I I M I I I I I I M I I I 1 I I I I I I I I I I i I I M I I t I I I I I I I I [| I I I I 
LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 
I I I I I I t I ) I t I I I I I I I I I I t I I t I I I I I I M I I t I I I I I M I I I I I I t t I 
FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 
430 440 450 460 470 

Homology with a predicted ORF from N.2onorrhoeae 

ORF48 shows 97.5% identity over a 1 19aa overlap with a predicted ORF (ORF48ng) from A^. 
gonorrhoeae: 

orf 48 . pep MNIHTLLSKQWTLPPFLPECRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

llll:lil:|[iltlMtlllllllllilMIIIIIMII}||ltlllllllllllllll 
orf 4 8ng MNIHALLSEQWTLPPFLPECRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf 4 8 . pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

t t I I I I I I I I I I I I M I I I t I I I I I I I I I I M I I I I I I t I I i I I I I t I I I I I I I t I I I 
or f 4 8ng ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 

The ORF48ng nucleotide sequence <SEQ ID 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 

1 MNIHALLSEQ WTLPPFLPKR LLLSLLILIA PNAVFWVLAL LTATA RPIVN 
51 LDYLPAALLI ALPWRFVKIA G VLAFWPAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFI LTAP APY QIMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG. . 

Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 

1 ATGAATATTC ACGCCCTGCT CTCCGAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGACCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 

401 CCGCCGTCAA AACCGACTTC CGACACATTG CCGTCTGTGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCc aAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGgcctG 

601 GTCGACCCCG TCTTCCTCCC CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 

651 GCTGAGTGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGGCAATCCC GAGCTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAATTGTGC GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGT AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAAA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATACG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TtCttcgACC AACTGGCGGA TTTGATCCGA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGACACG TCGCCTGGCT 

1401 GCACTTCAAA ATCAAATAA 



orf 48a. pep 
orf48-l 

orf 4 8a. pep 
orf48-l 



This encodes a protein having amino acid sequence <SEQ ID 476; ORF48ng-l>: 
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1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV 

151 AAAGYFTGHL SYYDRGRMAN IFGANNFYYA 

201 VDPVFLPLGN QQRAATRLSE PKSQKILFIV 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR 

451 NLNETFRYLK QGHVAWLHFK IK* 

ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf 4 B-1 . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
|||i:|||:||IIMtllll[lllllliltllllllIitlllllllll[llllllllllt 
orf4 8ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 4 8-1 . pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
I I I I I I I I I I I I I I I I I I I I I M I I M I I I i I I I I I i M I M I t I I I I I I I I I I I I I I I 
orf4 8ng-l ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 4 8-1 . pep LLYMLAMPFVLQKAAAKTDFRHIAVCAAWAAAGYFTGHL SYYDRGRMAN IFGANNFYYA 
I i M I I M I I I I i I I :[ I I I I I I t M I I I I I i I I I i I I 1 I I I I I I I I I I I I I I M I I I I I 
orf 4 8ng-l LLYMLAMPFVLQKAAVKTDFRHIAVCAAWA7VAGYFTGHLSYYDRGRMAN IFGANNFYYA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 4 8-1 . pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
I I I I I i I I I I I I I I I I I M I I I I I M I I I I I I I i t i : I : I i t I I I I I I I i M i I t I I : I I 
orf4 8ng-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 48-1. pep ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
I I I I I I I I I I I M I I I I i I I I I I I I I I I I I I i i I i I I I I I I I I I i I M I I i 1 I I I I I i I t 
orf4 8ng-l ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 8-1. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
I t i I I I I I I I I I t I t I I I I I t t t I I H I I I I I i i t I : 1 I I I I I t I M I I I I I I t I M I M 
orf48ng-l CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 48-1. pep LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
I I t t M I 11 I I I n M I I I I I ) I 1 I I I I I I I I I I i I I I M I I I I I I M I M I I I I I I I I I 
orf48ng-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

370 380 390 400 410 420 

430 440 450 460 470 

orf 4 8-1. pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 
M t I I I I I I : M i I I I I I t I i I 1 I I I I I I I I I I t I I I I I I I II I I M : I M I I 
orf48ng-l FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 

430 440 450 460 470 

Based on this analysis, including the presence of a putative leader sequence (double-underiined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae y and their epitopes, could be 
usefiil antigens for vaccines or diagnostics, or for raising antibodies. 



PNAVFWVLAL LTATARPIVN 
FDGLMMVIQL FPFMDLIGAI 
LQKAAVKTDF RHIAVCAAW 
KSQAMLYTVS QNADFITAGL 
AESWGLPGNP ELQNATFAKL 
AYGGLRGFAL RRAPDEKFAR 
PRAGFQKIKT AENLIGKKTC 
TLTSHADYPE SDIFNHRLKC 
RPEMKGTEVI IVGDHPPPVG 
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Examples? 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 . .GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 

351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

401 TGATCAATAT GTACGCC. . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 



1 ..VSGRYRALDR VSKIIIVTLS lATLAAAGIA MSRGMQMQSD FIEPTPWTLA 

51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 

101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 479>: 

1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCC'GCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLFKYPF FRFSAHYTLD TGKSLIEGYA EKSRVYL WVF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 S KIIIVTLSI ATLAAAGIAM SRGMQMQSDF lEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRP LVA FIAFACMYGT 

301 TITVVDGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIFW FDG 

351 VMAN LLKFAM lAAFVSAPVF AW LNYRLVKG DEKHKLTSGM NALALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF fi-om N.menimitidis (strain A) 

ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) fi-om strain A of A^. 



meningitidis: 
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10 20 30 

orf 53 . pep VSGRYRALDRVS KIIIVTLSIATLAAA6IA 

i I I I t I I I I I I t t I I I I M I I I I I i ) I I I t 
orf 53a AAI VKMAI PSL MF DAGTVAALIMASCLI ILV SGRYRALDRVSK I 1 1 VTLSI ATLAAAG lA 

110 120 130 140 150 160 



40 50 60 70 80 90 

orf 53 . pep MSRGMQMQSDFIEPTP WTLAGLGFLIALMGWMPA PI EI SAINS LWVTEKQRINPSEYRDG 
I I i It I i I I I I I t I i I I I M I I M I t I I I I I t I I I I I I I I i t I I I I ! I I I I I I I I I I I t I 
orf 53a MSRGMQMQSDFIEPTP WTLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
170 180 190 200 210 220 



100 110 120 130 139 

orf 53 . pep IFEFNVGY IASAVLALVFLALGXVA PNGNGXTVQMAGGKYNGQLINMYA 
I t : I I I I I 1) I ) I I I I I 1 I I t I : III : I I I I I I I I I I 1 I I I M 
orf 53a IFDFNVG YIASAVLALVFLALGAFV QYGNGEAV(yiAGGKYIGQLINMYAVTIGGWSRPLV 
230 240 250 260 270 280 



Z^U ZOU ZDU Z/U ZdU 

AFIAFACMYGTTITW DGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFD 
290 300 310 320 330 340 

complete length ORF53a nucleotide sequence <SEQ ID 48 1> is: 



orf 53a 

The 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

451 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TTGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTCAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This encodes a protein having amino acid sequence <SEQ ID 482>: 



1 MSEQHISTWK SKINALGPGI MMZVSAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYL WVF LILCILSATI 

101 NAGA VAIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAM SRGMQMQSDF lEPTP WTLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFVQYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM lAAFVSAPVF AW LNYRLVKG DEKHKLTSGM N ALALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1: 



10 20 30 40 50 60 

orf 53a . pep MSEQH I STWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQI ALII ILTNLFKYPF 
I t I t I I I I I I I I I I I t I I I I t i I I I I I I I I j [ I M I I I I I I I I t I I I I I I I I I I I I I I I I 
orf 53-1 MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALIXILTNLFKYPF 

10 20 30 40 50 60 

70 80 90 100 110 120 
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orf 53a . pep FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
I I I i I I I I I I I i I t I I M I I I I I t I I I I I f I I I It I I I I I I I I I I I I i I I I I I i I I M I I 
orf 53-1 FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orf 53a . pep MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 
I 11 I I t t I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I t I 1 I I I I I 1 I I I I I 
orf 53-1 MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 



15 



190 200 210 220 230 240 

orf 53a. pep lEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 
I I I 11 1) I I I I I M I I I t M I I I I I I t I i I I I I I I I 1 I I I I I I I I I I I I t I I i I I i I I I I 
orf 53-1 IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLV?VTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 



20 



250 260 270 280 290 300 

orf 53a . pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 
i I I I I I I I I I I I I 1 I I I M I t I I I I M n t I I I I I I I I I I I M M I I I I I i I i I I I I I 1 [ 
orf 53-1 AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

250 260 270 280 290 300 



25 



30 



35 



310 320 330 340 350 360 

orf 53a . pep TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
I I I I I t I I I M M I I 1 I I I I I I I M I t I I I I I I I I I I I I I I I I I I M I I I I i I I I I t I I I 
orf 53-1 TITVVDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 

310 320 330 340 350 360 

370 380 390 400 410 

orf 53a . pep lAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

1 1 1 1 1 1 i 1 1 1 i i 1 1 1 1 1 M 1 1 1 1 1 M I ) 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 1 

orf 53-1 lAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 
370 380 390 400 410 

Homolopv with a predicted ORF from N.mnorrhoeae 

ORF53 shows 92.1% identity over a I39aa overlap with a predicted ORF (ORF53ng) from N. 



40 



45 



50 



gonorrhoeae: 

orf 53 .pep 
orf 53ng 
orf 53. pep 
orf 53ng 
orf 53 .pep 
orf53ng 



VSGRYRALDRVSKIIIVTLSIATLAAAGIA 
I I I t I I I I I I I I I I I I I I I I I I I I I I ) I i I 
AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIA 



30 



91 



90 



MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 
I M I I t I I I I M I I I I I I I I I I I I I I I M I I I I t I t I I I I I I I I I I I I t 1 I I I i t I I I I 
MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 151 

IFEFNVGYIASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 139 
I i : I I i I M I I M M I I I I M 1 : Ml : i I I : t I I I I t I I I I i I 

IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 211 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
acid sequence <SEQ ID 484>: 



55 



60 



1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLII LVSGRYRALD RVS KIIIVTL SIATLAAAGI AM SRGMQMQP 

101 DFIEPTPW TL AGLGFLIALM 6WMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ED 485>: 

1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 
51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 
101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 
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151 ATTATGGCAT CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT 

201 GGATCGTGTT TCCAAAATCA TCATTGTTAC TTTGAGCATC GCCACGCTTG 

251 CCGCCGCCGG CATCGCTATG TCGCGCGGTA TGCAGATGCA GCCCGATTTT 

301 ATCGAGCCGA CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT 

351 GATGGGCTGG ATGCCCGCGC CGATCGAAAT TTCCGCCATC AATTCTTTGT 

401 GGGTAACCGA AAAACAACGC ATCAATCCTT CTGAATACCG CGACGGGATT 

451 TTCGATTTCA ACGTCGGTTA TATCGCcagT GCGGTTTTGG CTTTGGTTTT 

501 CCTTGCACTG GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA 

551 TGGCGGGCGG CAAATATATC GGGCAATTGA TTAATATGTA TGCCGTAACC 

601 ATCGGCGGCT GGTCTCGTCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT 

651 GTACGGCACG ACGATTACCG TTGTGGACGG TTATGCGCGT GCCATTGCCG 

701 AACCCGTGCG CCTGCTGCGC GGCAGGGATA AAACCGGCAA CGCCGAGTTG 

751 TTtgccTGGA ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG 

801 GTTTGACggc gcaaTGGCgG AACtgcTCAA ATTTGCGATG ATtgccgcCT 

851 TTGTGTCCGC CCCTGTGTTC GCCTGGCTCA ACTACCGCCT CGTCAAAGGG 

901 GACAAACGCC ACAGGCTTAC CGCCGGTATG AACGCCCTTG CCATTGTCGG 

951 CCTGCTCTAC CTGGCCGGGT TTGCCGTTTT GTTCCTGTTG AACCTTACCG 

1001 GACTTTTGGC ATAG 

This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 

1 ..KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL 

51 IMASCLIILV SGRYRALDRV S KIIIVTLSI ATLAAAGIAM SRGMQMQPDF 

101 lEPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR INPSEYRDGI 

151 FDFNVG YIAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI GQLINMYAVT 

201 IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR GRDKTGNAEL 

251 FAWNIWVAGS GLAVIFW FDG AMAE LLKFAM lAAFVSAPVF A WLNYRLVKG 

301 DKRHRLTAGM N ALAIVGLLY LAGFAVLFL L NLTGLLA* 

ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 

60 70 80 90 100 110 

orf 53-1 . pep ILTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 

: I I I I I I I i I I M I I M I I I I I I I I [ I I 
orf53ng-l KKSCVYLWVFLILCIASATINAGAVAIVTA 

10 20 30 

120 130 140 150 160 170 

orf 53-1 . pep AIVKMAIPSLMFDAGTVAALIMASCL 1 1 LVSGRYRALDRVSKIIIVTLS I ATLAAAGIAM 
I I I I I I I 1 I I I I I I I i I I I I I I I i M I I I I I I I I 1 I t I I I t i I I I I I I I I I I 1 I I t t I I i 
orf53ng-l AIVKMAIPSLMFDAGTVAALIMASCL II LVSGRYRALDRVSKI I IVTLSIATLAAAGI AM 
40 50 60 70 80 90 

180 190 200 210 220 230 

orf 53-1 . pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
I I i I I I I I I I I I t I i I I I I I I It I 1 I It i I I I t I t I I I I 1 I I t I I I I t I I I t I I i M I I 
orf53ng-l SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
100 110 120 130 140 150 

240 250 260 270 280 290 

orf 53-1 . pep FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVT IGGWSRPLVA 

i I I I I I I I I I 1 I i t I I I 1 1 I I 1 1 1 I I I I I I 1 1 I I I I t I I I I I I 1 1 i I I I I M I t M I I I t 

orf53ng-l FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVA 
160 170 180 190 200 210 

300 310 320 330 340 350 

orf 53-1 . pep FIAFACMYGTTITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDG 
I I I I I I I I I I t I t t I I I I I I M I M I t I I I I : I I I i I I I : I I I I i I I I I I I t I I I I I I t i 
orf53ng-l FIAFACMYGTTITWDGYARAIAEPVRLLRGRDKTGNAELFAWNIWVAGSGLAVIFWFDG 

220 230 240 250 260 270 

360 370 380 390 400 410 

or f 53-1 . pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 
: I I : I I I i I 1 I I I [ I I I I I I I I I I I I I I I I I 1 : I I : I I I I I I ::) I : I I : I I : t I I I I 
orf53ng-l AMAELLKFAM I AAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAIVGLLY LAGFAVLFLL 

280 290 300 310 320 330 



orf 53-1 .pep 
orf53ng-l 
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Based on this analysis, including the presence of a putative leader sequence (double-underUned) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N, meningitidis and Kgonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 58 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 487>; 

1 . . TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 
51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAATCCGAA CGCGAGTGGA 
101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 
151 CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 
201 GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 
251 TCCGTTTCTG CCGAAACTAT CTGGCGCACG AATCCGAACC GGACAGGCCC 
301 GTTCCGCCT. . 

This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWALA GLFFVRAQSE REWMREVSAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALM LFH AVKTAVYWLF VGWR FCRNY LAHESEPDRP 
101 VPP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 489>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGT^A 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCATATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 

851 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT GGGGAAAGGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCG7UVGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCCATC 

1551 TG/VAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACGGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 
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1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG -AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGA AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGTAATCTTG CGGGCTTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATTTGAT TCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCTGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TTTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGACGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGC7\AAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG TATCGGCTAC JUVCCGCGCCG 

2951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 



1 MFWIVLIVIL LLALAGZ,FFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYIAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED lATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG Q/KEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWWEPPEV 

401 PBCVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSEMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLE KLPFI 

801 V\AA/DEFADL MMTA GKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 

951 DPMYDEAVSV VLKTRKASXS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 



Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 

Homology with a predicted ORF from N.menineitidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A ofN. 
meningitidis: 



10 20 30 40 50 60 

orf 58 . pep LRETAYVLDSFDRYFW ALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

:: : I t 1 I M I I I I I i 1 I I I I i I I t I M I M I I I I I I I I I I I I t I I I 
or f 5 8 a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPE I KDGMP D 

10 20 30 40 50 



70 80 90 . 100 

orf 58 . pep FPELAU4 LFHAVKTAVYWLFVGVVR FCRNYLAHESEPDRPVPP 

1 1 1 M 1 1 1 1 It 1 1 1 1 1 1 1 1 1 [ I i 1 1 1 1 M 1 1 1 i 1 1 1 1 1 1 n i I 

orf 58a FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVP PAS ANRADVPTAS DGYSD 

60 70 80 90 100 110 



-288- 



The complete length ORF58a nucleotide sequence <SEQ ID 49 1> is: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGT^ 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAATCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAZ^ ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGGNAAAGGG CAGGCGGAGG CNAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCNGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAANTGTTTC 

1101 GTCTGTGGGA TACGGCGNTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGCCCGC AATNGATATT CCGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GGCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAA ACCGATCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGAATGACG GCAGTGAGGG 

1401 TGTGGCAGAG CGGTCAAGCG GGCAATATTT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCCGCC GGGCATNGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCGCCGCT GTTCAATCCC GGGGCGACGC AAACGGAAGA AGANCTGTTG 

1651 GANAACAGCA TCACCATCGA AGAAAAATNG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTAAATCT GGAAAAAGAN 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCT 

1851 CGGCAAAACC TGTATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGTNTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG GGAGAAAATC GGCAACCCGT 

2351 TCAGCCTCAC GCCCGACAAT CCCGAACCTT TGGANAAATT GCCGTTTATC 

2401 GTGGTCGTGG TTGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATCTTAT CCTTGCCACA CAACGCCCCA GTGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC GGCAGGATTC TTGACCAAAT GGGTGCGGAA AACCTGCTCG 

2651 GGCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACGGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATN TTGAGCGGCG 

2801 GTATGTCCGA CGATTTGCTG GGAATCAGCC GGAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTGTCNGTT GTTTTGAAAA CGCGCAAAGC 

2901 CAGCATTTCT GGCGTGCAGC GCGCATTGCG TATCGGCTAT AATCGCGCCG 

2951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTNGACAATG CTTGA 

This encodes a protein having amino acid sequence <SEQ ID 492>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEECKG EKQAELPEIK 

51 DGMPDFPELA L MLFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA 7VEEEAADTED lATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 
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10 



15 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



SADYGFEPYF 
QGQSVSDGTA 
DVEMPSETEN 
PKVPMPAXDI 
GGWQEETAAI 
PSRRAXDTEA 
XNSITIEEKX 
LARSLGVASI 
KLTLALGQDI 
APEDVRMIMI 
RYRLMSFMGV 
WWDEFADL 
LIKANIPTRI 
VHGAFASDEE 
DPMYDEAVSV 
HNGNRTILVP 



EKQHPSAFSA 
VRDAXRRVSV 
VFTEXVSSVG 
PPPPPVSEIY 
ANDGSEGVAE 
DEGAFQSEET 
AEFKVKVKW 
RWETILGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMTAGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
XDNA* 



VKAENARNAP 
NLKEPNKATV 
YGXPVYDETA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



FRRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLA7VNA 
GNPFSLTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



QAEAKSPDVS 

PESRTWGKR 

wDAWWEPPEV 

TDHLADDVLN 

CPFENVPSER 

GATQTEEXLL 

GNSVLNLEKX 

NSPEFAESKS 

AMILS^5LFBCA 

LNWCVNEMEK 

PEPLX KLPFI 

QRPSVDVITG 

LPPGTAYPQR 

GISRSGDGET 

EAEGIVSAPE 



ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 



20 



25 



30 



35 



40 



10 20 30 40 50 60 

or f 58a . pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

tllflllltlliMIIIIIMIIMMMIIillilttlltlllltltllllllllilll 

orf58-l MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
10 20 30 40 50 60 

70 80 90 100 110 120 

or f 58a . pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
illtlllMIIMIIIIIIIIIIIiltllllllMilllllMMIIIIIIIIIIIini 
or f 58 -1 LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58a . pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
lltlltMIIMIIIIIIIMIIIIIIIIIItllttlltlltlllllllllltlllMII 
orf 58-1 EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58a . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
llllllll!llllllllltllltltllliltlltilll[IIIIIIIIIIIMItlMt:| 
orf 58-1 EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 

190 200 210 220 230 240 



45 



50 



250 260 270 280 290 300 

orf 58 a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
I I I t I I I I I I i I I t M I I I I I t I I I I M I I I I I i I [ 1 I I I I : I I I I I I I I I M I I I I t I I 
O r f 5 8 - 1 FDADKEAFSE SADYG FE P YFEKQHPS AFS AVKAEN ARNAP FHRHAGQGKGQAEAKS PDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58a . pep QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 
I i I M I t i I I I I I I llllllltlllMliMMtllll)llt:IIIIIIIIMIIIIII 
orf 58-1 QGQSVSDGTAVRDAEIRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 

310 320 330 340 350 360 



55 



60 



65 



70 



370 380 390 400 410 420 

orf 58 a . pep VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDI PPPPPVSEIY 
illhlllllll I M I 1 I I I I I I I t i I M I I I I I I I I I I I I I t I I tl IIMIIIII 
orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 58a . pep NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 
llllttl:|llllltllllllltlltllll{illlllllll:llllt:|IMMtllill 
orf 58-1 NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
430 440 450 460 470 480 

490 500 510 520 530 540 

or f 58a . pep EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
I t I I I I I ! t i I I I I I I I I I I I t I : I I It I I i I I I I I i I I I t I I I I I t I I I I I I I t I I 
orf 58-1 EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 
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550 560 570 580 590 600 

orf 58a . pep GATQTEEXLLXNSITIEEKXAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 

Mini II II I II I II I I I I I I I I II I II I I I II II II I t II I I I I 1 I I i I t I I I 
orf 58-1 EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 58a . pep lARSLGVASIRVVETILGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
II I I 1 I I I It I I I I I I 11 t I II I I I I I II I I I II I I II M I II t I'l I t I I I I II I I I I I 
orf 58-1 LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 58a . pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
II I I I I t M I 1 I i t II I I i I I 1 I I II I I I I II I I I I I I II I I I I I t I I I I I I II I I I I I I 
orf 58-1 TGQPWTDLGECAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
670 680 690 700 710 720 

730 740 750 760 770 780 

orf 58a . pep EGIPHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGXNQKIAEAAARGEKI 
I II I II I I I i I I I I I II i I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II t II I I I I I 
or f 5 8 - 1 EGI PHLLAPWTDMKLAANALNWCVNEMEKRYRLMS FMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 

790 800 810 820 830 840 

or f 58a . pep GNPFSLTPDNPEPLXKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
111111111:1111 I t M 1! I I I I I I I I I t I I I I I II M ) I I II I I I I I I I I I I I 1 I t I 
orf 58-1 GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLIIAT 

790 800 810 820 830 840 

850 860 870 880 890 900 

or f 58a . pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 
I I I I I I I I I I II i I I I I I I I I I I I I I II I I I I I I I II II I I I II II i M I I II I I I I I I 
orf58-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

orf 58a . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 
I I II I I I I t I I I I II I I I I I I I I I I II II I I I I I : : I I I : I II I I I I I t I I I i I I I 
orf 58-1 VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

orf 58a . pep VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPXDNAX 
I I I I I i I 1 I I I I I I I i I t I I I I I I I I II i II M t I I I I I I I I I I I I I I I I I I I I 
orf 58-1 VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 

970 980 990 1000 1010 

Homology with a predicted ORF from N.^onorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from K 
gonorrhoeae: 

or f 58 . pep ALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 103 

I I t I I I II I 

orf58ng SEPDRPVPPASANRADVPTASDGYSDSGNG 30 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 

1 . . SEPDKPVPPA SANRADVPTA SDGYSDSGNG TEEAETEAAE AAEEEAADTE 

51 DIATAVIDNR RIPFDRSIAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

101 AALRETKKRY IDAFEKNGTA VPKVRVSDTP MEGLQIIGLD DPVLQRTYSR 

151 MFDADKEAFS ESADYGFEPY FEKQHPSAFS AVKAENARNA PFRRHAGQEK 

201 GQAEAKSPDV SQGQSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 

251 IPESRTWGK RDVEMPSETE NVFTETVSSV GYGGPVYDEA ADIHIEEPAA 

301 PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 
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351 ETDHLAADVL NGGWQEETAA lADDGSEGAA ERSSGQYLSE TEAFGHDSQA 

401 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

451 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRWETIPGK TCMGLELPNP ECRQMIRLSEI 

551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHLLV AG TTGSGKSV GV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCVNEME KRYRLMSFMG VRNLAGFNQK lAET^AARGEK IGNPFSLTPD 

701 DPEPLE KLPF IWWDEFAD LMMT AGKKIE ELIARLAQKA RAAGIHLILA 

751 TQRPSVDVIT GLIKANIPTR lAFQVSSKID SRTILDQMGA ENLLGQGDML 

801 FLPPGTAYPQ RVHGAFASDE EVHRWEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKASI SGVQRALRIG YNRAARLIDQ 

901 MEAEGIVSAP EHNGNRTILV PLDNA* 

This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

ORF58ng: 4 67 lEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 526 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
FtsK: 868 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

ORF58ng: 527 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
FtsK: 928 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

0RF58ng: 587 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 646 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
FtsK: 988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

ORF58ng: 647 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP— 704 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I -t-P+ D + 

FtsK: 104 8 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKXAEADRMMRPIPDPYWKPGDSMDAQH 1107 

ORF58ng: 705 — LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 762 

; L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
FtsK: 1108 PVLKKEPYXWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

ORF58ng: 763 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 822 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D-t-EV 
FtsK: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVR[>QEV 1227 

QRF58ng: 823 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
FtsK: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFI>QAVQFVTEKRKASISG 1286 

ORF58ng: 883 VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 921 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
FtsK: 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 

Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
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801 CTTTTCTGCA GTGAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAGCTGCC GATATCCATA 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGGAGGTAG CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

1301 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATT ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGA7VAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 

1 MFWIVLIVIV LLALAGLFEV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPEFS LM LFHAVKTA VYWLEVGW R FCRNYLAHES EPDRPVFPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEAAEA AEEEAADTED lATAVIDNRR 

151 IPFDRSIAEG LMQSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

201 DAFEKNGTAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWWEPPEV 

401 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 

501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGITHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEK LPFI 

801 WWDEFADL MMTA GKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 



ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 
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10 20 30 40 50 60 

orf 58-1 . pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
I I I I I I I I I : I 1 I I I I I I I i I I I I I I I I M I I M M I I I I I I I I t I I I I M I I I I I I I : : 
orf58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58-1 . pep LMLFHAVKTAVYWLFVGWRFCRNY LAKES EPDRPVP PAS ANRADVPTASDGYSDSGNGT 
I I I I I I I I M I I I M I I M I I I I t t t t M I I I I I 1 t i I I t I I I I I I I I i I I I I I I I I I I i 
orf58ng-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58-1 . pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I lit: I I I I I I I I I I It 
orf58ng-l EEAETEAAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMQSESKTSPVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58-1 . pep EEATRALNSAALRETECKRYIDAFEKNETAVPKVRVSDTPMEGLQIXGLDDPVLQRTYSHM 
t I I t It 1 : I I t I I I II I I 1 I I I I I I I I I I II I I I i I I 1 I I I I I t I I I M I I II II t I : I 
orf58ng-l EEATRALSSAALRETECKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58-1 . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

II III II II It Ml II I I INI nil I Mil I 111111111:111 II iillllllllll 
orf58ng-l FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58-1 . pep QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 
I I II II I II I I I I I I I I I I I II I I I I I I I I I 1 I I I I I I i II I i : I I II I I I I I I I I I I I i 
orf58ng-l QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf 58-1 . pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
Illll||lllllllllll:lillllllllllllllllllll:| : III lllllllll 
orf58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 58-1 . pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
I I t I I I I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I t I I I I I I I I I 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58-1 . pep EAFGHDSQAVCPFEKVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
I t I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l EAFGHDSQAVCPFEDVPSERPSCRVSCrrEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58-1 . pep EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEXEPDVGVRGNSVLNLEKD 
I i I I I I I I I I I I I I I t I I I t I I I I I I I I I I I I I I I I I I I I I It I I I I I I I I I I I I I I I I I 
orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 58-1 . pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I i I i I I I I I I I I I I I I I I I I t 
orf 58ng-l LARSLGVASIRWETIPGKTCMGLELPNPKRO^IRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 58-1 . pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I t I I I t I I I 
orf58ng-l TGQP WTDLGKAPHLLVAGTTGSGKS VGVN AMI LSMLFKAAPEDVRMIMI DPKMLELS I Y 

670 680 690 700 710 720 



wo 99/24578 



.294- 



PCT/IB98/01665 



730 740 750 760 770 780 

orf 58-1 . pep EGIPHLIAPVVTDMKIJ^ALNWCWEMEKRYRLMSFMGVRNIAGFNQKIAEAAARGE 

lit I I I I t I i M ) I M M I M i I I M I I I I I I I I I t I I t I I I I I I I I I I t I I I M I t I i 
orf58ng-l EGITHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 



10 



790 800 810 820 830 840 

orf 58-1 . pep GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQECARAAGIHLILAT 
I I I i I It I I It t I I I I I I I I I i I I I I I t I I I I I I I I i I I I M I I I I I I I I I I I t t I I i I I 
orf58ng-l GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 



15 



850 860 870 880 890 900 

orf 58-1 . pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 
It I I I I t I I II II I I I n t I I It 1 I I I I I It I I I I I I I II I I I I II I It II I t I I I I I I 
orf58ng-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

850 860 870 880 890 900 



20 



25 



910 920 930 940 950 960 

orf 58-1 . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 
I I i I I I I I i I I I I 1 I I I I I t I I 1 I I I 1 I I I It I I I i II t I t I I I I I t I I 1 I t It I I I I I 
orf58ng-l VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

orf 58-1 . pep VLKTRKASISGVQRALRIGYNRAARLIDQMETVEGIVSAPEHNGNRTILVPLDNAX 
I I I I I I t I I I I I I I I t t I I I t I I I I I I M I I I t t I I I I I I I I I II I I I I I I I I I I 
orf58ng-l VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 

970 980 990 1000 1010 



30 Furthermore, ORF58ng-l shows significant homology to the Kcoli protein FtsK: 

sp|P4 6889|FTSK_EC0LI CELL DIVISION PROTEIN FTSK >gi | 1651412 | gnl | PID | dl015290 (Dl 
division protein FtsK [Escherichia coli) >gi 1 1651418 I gnl | PID |dl015296 (D90727) Cell 
division protein FtsK [Escherichia coli] >gi 11787117 (AE000191) cell division 
protein FtsK [Escherichia coli) Length = 1329 
35 Score = 576 bits (1469), Expect = e-163 

Identities - 301/459 (65%), Positives = 353/459 (76%), Gaps = 5/459 (1%) 



40 



45 



50 



55 



60 



65 



Query: 556 lEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 615 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
Sbjct: 868 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

Query: 616 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 675 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
Sbjct: 928 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

Query: 676 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 735 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
Sbjct: 988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

Query: 736 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP— 793 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

Sbjct: 1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

Query: 7 94 — LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 851 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
Sbjct: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

Query: 852 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 911 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
Sbjct: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

Query: 912 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 971 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
Sbjct: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

Query: 972 VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 1010 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
Sbjct: 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 
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Based on this analysis, it is predicted that the proteins £ix>m Kmeningitidis and N.gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 59 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 497>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC..GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 

51 ALVGFWV 

// 

301 ...lAIGLFL lYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 
351 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 499>: 



1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

451 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCAAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 

751 CCGACCGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGC/yU\A GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVM QFAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF AVALILL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORFlOl shows 91 .2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
an ORF (ORFlOla) from strain A of A^. meningitidis: 



10 



20 



30 



40 



50 



orf 101 .pep 



orflOla 



orf 101 .pep 



orflOla 



orf 101. pep 



orflOla 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWVX 
I I I i I I I I 1) i I I I t I I I I i I i I I I It I I I I t I M I III I I I I II II I I I I II 

MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 
10 20 30 40 50 

// 

90 100 110 

lAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

I II I I II I I II I t I I I I I I I I I I M I II II 
LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEEXSKIHFWLGL 
280 290 300 310 320 330 

120 130 140 150 

LPMHIIMHVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 
I I t I I I I I I : I :: I M I I II I I I I II I I I i I I I t I i I I I I 
LPMHIIMFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 



jqU JOU JDU J/U 

The complete length ORFlOla nucleotide sequence <SEQ ID 501> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGATTTATC 
CATTTTCGTC 
TGCTCGGCCN 
TTGGTCGGCT 
CGCATTTATC 
AAATGTCGGT 
CCGGTGATGC 
GCTTTGGGTG 
TCCTGAAGCA 
AGTTTGGGCA 
CGAATCCGGC 
GCGGCGACAA 
AACAAACGCA 
CGGACGCGCC 
TCAGCACCAC 
CCNACNGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGANTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGCATGCCCA 
GAAAGGCGGA 



AAAGAAACCT 
GTCCTCTTGG 
TGCCGCCGAC 
TCTGGGTCNN 
AGTACGTTGA 
CTGGNTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 
CGCTCGAATT 
GACTACAATC 
GCCCAAACTC 
AACTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



CATCAAAGAA 
CGGTATTGGT 
NGGCGTNTCG 
NNGNATGACG 
CCGTGTTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTACG 
TTGTCTTTGG 
CAGGGTTTAT 
ACCTGTTCCT 
NCCAAAGAAA 
GCGCCACGGC 
AGGTTTCCTT 
ATCGACCCCG 
CAGCAGCAAC 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



CTCTCTTTTA 
CTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CATTGAAACA 
GTTTTGGTTG 
CAGCCGCGAA 
TGGAGGCAGG 
TTTGTCGAAA 
GCGCGAACAG 
GTAACTTCTC 
TACCGTTACA 
CCNAAAACTC 
TTTCCCACCG 
CCGCAACATC 
CCTCCTACTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



CCGCCGTCGG 
GCAATCAACC 
CGTGTTGGCA 
TNGTGTTGAC 
CGNGACAGCG 
ATGGATACGC 
CCGTCATGCA 
TACGCTGAAA 
CGGGTTCAAC 
CCTTCGATAC 
GACAAAAACG 
GCTGAACGAC 
GCGGCACGCC 
AACCTGATTA 
CCGTACNATN 
ANGCGGAATT 
TGCCTGCTTG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This encodes a protein having amino acid sequence <SEQ ID 502>: 



1 

51 
101 
151 
201 
251 
301 
351 



MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 
LVGFWVXXMT PLLL VLTAFI STLTVLTRYW 
PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE 
SLGKRNGRVY FVETFDTESG IMKNLFLREQ 
NKRTLELRHG YRYSGTPGRA DYNQVSFXKL 
PTAQLIGSSN PQHXAELMWR ISLTVSVLLL 
LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL 
SMPSQPFWQA VGKSLTLKGG K* 



AINLLGXAAD 
RDSEMSVWXS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



GLLPMHIIMF 



XRXAIDAVLA 
CGLALKQWIR 
LSLVEAGGFN 
XKESNFSLND 
IDPVSHRRTX 
NPRSGHTYNI 
VIAIVLLRVR 



ORFlOla and ORFlOl-1 show 95.4% identity in 371 aa overlap: 



orflOla . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 
I I I I I I I 1 I I I I I I i I I I I I I I I I I I I II I I I I I I I III i I I I I I t I I I I I I I II 
orf 101-1 MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 



60 



60 



orflOla . pep PLLLVLTAFISTLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 

I I M I I I I I I I I I I I I i I I I I I I I I I I I I I t I I I I I I I t I I 1 I I I I I M I I I i I I I II I 
orf 101-1 PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 
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orf 101a. pep 
orflOl-1 
orf 101a. pep 
orflOl-1 
orf 101a, pep 
orflOl-1 
orf 101a .pep 
orflOl-1 
orf 101a. pep 
orflOl-1 



IPWAELRSREYAEILKQKQELSLVEAGGFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 
t I I I I I t I I I I I I I I t I ) I 1 I il t I I I i I I I I I I I t I t I I I M I I I I I N I I I I I I I I I 
IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 240 
M I I I I I I I I I I : I I I I I I I I I I M I I I t I I I I I I I I I t I I I I I I I I I ] I I M I I i I I 
DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 240 

IDPVSHRRTXPTAQLIGSSNPQHXAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 
lllllllll IMMIIIlliM MIIIIIIIIIIIIIIIIMillltltllilttlll 
IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 
I i I I I I I t I I I I I i I I I I I I I I I I I I I I I It I I I I I I I t :: I :: I I I I t I 1 I i I I I I I t 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 360 

VGKSLTLKGGK 371 
I I I I I I t i I I I 
VGKSLTLKGGK 371 



Homology with a predicted ORF from gonorrhoeae 

ORFlOl shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N, 
gonorrhoeae: 



orf 101 .pep 
orflOlng 



orf 101. pep 
orflOlng 
orflOl.pep 
orflOlng 



MI YQRNLIKELS FT AVGI FWLLAVLVSTQAINLLGRAADGXVIAI DAVLALVG FWV 
I I I M I I I I I I I I I I I t I I I I I I I I I I I I I t I I i I I I I i I I I i I 1 I i I t i t I I I I 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 

// 



57 



59 



333 



lAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 
1 1 n I M I I I I I I I I t i I i I t I M I I I I I I 
SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

LLPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 373 
I M I i I I I I I : I :: I I I I I I i I I I I I I I I I I 

LLPMHIIMFVIAIVLLRVRSMPSQPFWQAVG 362 



The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 504>; 



MIYORNLIKE LSFTAVGIFV 
LVGFWVIGMT PLLL VLTAFI 
PVM QFAVPFA ILIAVMQLWV 
NLGKRNGRVY FVETFDTESG 
NKRTLELRHG YRYSGTPGRA 
STAQLIGSSN PQHQAELMWR 



1 
51 
101 
151 
201 
251 

301 LIAIGLFLIY QNGLTL LFEA VEDGKXHFWL GLLPMHIIMF 
351 SMPSQPFWQA VG. . . 



VLLAVLVSTQ 
STLTVLTRYW 
^PWAELRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



Further work revealed the complete nucleotide sequence <SEQ ID 505>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGATTTATC 
CATTTTCGTC 
TGCTTGGCCG 
TTAGTCGGCT 
CGCATTCATC 
AAATGTCGGT 
CCCGTCATGC 
GCTTTGGGTG 
TTTTGAAGCA 
AACTTGGGCA 
CGaatccgGC 
gcggcgacaA 



AAAGAAACCT 
GTCCTCTTGG 
CGCAGCTGAC 
TCTGGGTCAT 
AGCACGCTGA 
CTGGCTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 



CATCAAAGAA 
CGGTGTTGGT 
GGGCGTGTCG 
CGGTATGACC 
CCGTATTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTGCG 
TTGTCTTTGG 
CAgggtttaT 
ACCTGTtcct 
GCcaaaGAag 



CTCTCTTTTA 
GTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CGTTGAAACA 
ATCCTGATTG 
CAGCCGCGAA 
TGGAAGCCGG 
TtcgtcgaaA 
GcGCGAACAG 
gtaactTctc 



CCGCCGTCGG 
GCGATCAACC 
CGTGTTGGCC 
TGGTGTTGAC 
CGCGACAGCG 
GTGGATACGC 
CCGTCATGCA 
TATGCCGAAA 
CGAGTTCAAT 
CCTTTGACAC 
GACAAAAACG 
gctgaaggaC 
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601 AACAAAcgca cgctcgaATT 

651 CGGacGCGCc gactaCAATC 

701 TCAGCACCAC GCCCAAacTT 

751 tcgacCGCCC AAcTGATTGG 

801 GATGTGGCGC ATCTCGCTGA 

851 CCGTGCCGCT TTCCTATTTC 

901 TTGATTGCCA TCGGTTTGTT 

951 TTTTGAAGCC GTGGAAGACG 

1001 CTATGCACAT CATCATGTTC 

1051 AGTATGCCCA GCCAGCCCTT 

1101 GAAAGgcgGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVM QFAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAE1J4WR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOlng-1 and ORFlOl-1 show 97.6% identity in 371 aa overlap: 

10 20 30 40 50 60 

orf 101-1 . pep MI YQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 
I I M I I I I I I I I I I I I t I t I I I t I I I I [ I I I M I I M I I I I M I I I I I I i I t M I I I i I I 
orflOlng-1 MIYQRNLIKELSFTAVGXFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 101-1 . pep PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 
I t t I i I I I I I I I I I t I I I I I I I i I I I I I I ) I I I I I M I I I I I I I t I t I I I : I : i I I t t I I 
orflOlng-1 PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAILIAVMQLWV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 101-1 . pep IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 
I I t I I I I I I I I I i I I I I I t I I I I I t I I I I I : I I I I I I I I i I n M I I I I I I I I I I I i I i I 
orf lOlng-1 I PWAELRSREYAE I LKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESG IMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 101-1 . pep DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 
I I M t I M i M I I I I I I I : I I I t I i 1 I I I ) I I I I I I I I I I i i I I I I i It M I I I I I I I I I 
orflOlng-1 DKNGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 101-1 . pep IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
I M I M I I i I I I I I I I I I I } I I I I t I I I I I I I I I I I I I I I I I I I I 1 I I I I I 1 I t I i I I I 
orflOlng-1 IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 101-1 . pep LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 
I I I I I I I I t t I t I It I I I I I t I I I i i I I I I I I t I I I I I I t : : I : : I I I I I I I t t i I I I M 
orflOlng-1 LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

orf 101-1 .pep VGKSLTLKGGKX 
I I I I I I I I I t I t 
orflOlng-1 VGKSLTLKGGKX 
370 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 



GCGCCACGGC TACCGTTACA GCGGcacgcC 
AGGTTtcctt cCTVAAAacTc aacctgATta 
ATCGaccCCG TTTCCCACCG CCGCACCATT 
CAGCAGCAAT CCGCAACATC AGGCAGAATT 
CCGTCAGCGT CCTCCTGCTC TGCCTACTCG 
AACCCGCGCA GCGGACATAC CTACAATATC 
TTTAATTTAC CAAAACGGGC TGACCCTGCT 
GCAA7UVTCCA TTTTTGGCTC GGACTGCTGC 
GTCATCGCAA TCGTACTTCT GCGCGTCCGC 
CTGGCAGGCG GTTGGCAAAA GTCTGACATT 
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predicted that the proteins from Kmeningitidis zn& N, gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 507>: 

5 1 . . GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

10 251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT.GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORFl 13>: 

15 1 ..GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 
101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 

Homoloev with with pspA putative secreted protein of Kmeningitidis (accession AF030941) 
20 ORF and pspA show 44% aa identity in 179aa overlap: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

25 orfll3 PVWGQDVRWAGQNDVAATGDAHSPILXXXXXXXXXXXXXXGTHIPLFAIDTGKLGGMYA 120 
VWG+DV+W+G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKWSGKNKLDFDG SLAKTASAPSSSDSVTPTVAIDTATLGGMYA 307 

or f 113 NKITLISTVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 179 
30 +KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 

Homology with a predicted ORF from K^onorrhoeae 

ORFl 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94. 1% identity in 1 7aa 
35 overlap at the C-terminal part with a predicted ORF (ORFl 13ng) from K, gonorrhoeae: 

or f 11 3 GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

i I I I I I I I I i I I i : : M I I I t i : I : I I M 
orfll3ng SHPSQLNGYIEVGGEIRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

40 orfll3 QGNWIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRWAGQNDVAATGDAHSPILNNA 90 

I t I : I i I I I 1 I I I I i I t : I I i I 
orfll3ng QGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

orfll3 IDTGKLGGXVCQQNHLDQYGRASRHS 135 

45 1 1 1 1 1 1 1 1 1 1 1 1 : 1 n I 

orfll3ng DFSGFKIRQGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

The complete length 0RF113ng nucleotide sequence <SEQ ED 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 

50 1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 

51 SKAFCFSALG FSLCLALGTV NIAFADGItT DKAAPKTQQA TILQTGNGIP 
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101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARWVN QINSSHPSQL NGYIEVGGRR AEWIANPAG lAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from Kmeningitidis and N,gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 51 1>: 

1 ..TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT. . 

This corresponds to the amino acid sequence <SEQ ID 512; ORFl 15>: 

1 ..STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF03094n 
0RF115 and pspA protein show 50% aa identity in 325aa overlap: 

OrfllS: 1 STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGISLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
pspA: 778 STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 831 

Orf 115 : 61 PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 

LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 
pspA: 832 -LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 

Orf 115: 121 LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 

L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
pspA: 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 



OrfllS: 181 
pspA: 951 



WLVQKEVKLPDGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGTIAG 239 
WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G lAG 
WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGS WDIG- SGAIENRGGLIAG 1009 
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10 



15 



20 



25 



30 



35 



STGHSEQNYTLPREITRNISLGSFAYESHRK 
III II I II II : II I I : I I I I I I I I I I I t 
NEQTEX^EKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 



31 



71 



81 



Orfll5: 240 RNALIINTDTLDNIGGRIHAQKSAVTATQDINNIGGMLSAEQTLLLNAGXXXXXXXXXXX 299 

R ALI+N +N+G++ ADING+AE LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGBCNIFAAAGSDITNTGS-IGAENALLLKASNNIESRSETRS 1068 

OifllS: 300 XXXXXXXXXYLDRMAGIYITGKEKG 324 

+ R+AGIY+TG++ G 
pspA: 1069 NQNEQGSVRNIGRVAGIYLTGRQNG 1093 

Homology with a predicted ORF from N. gonorrhoeae 

0RF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORFllSng) from 
Kgonorrhoeae: 

orf llS.pep 
orf llSng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf llSng 
orf 115. pep 
orfllSng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115 .pep 
orfllSng 



ALSHHAPSQGTELPQSN GISLPYTStJSFTPLPSSSLYIINPVNKGYLVET 

111:1111111111111 Itlllll I I I I I II : I I I I II I I : I I I I I I I I 

ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIXNPANKGYLVET 131 

DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 
I I I II I M I I I I I II I I I I II I II I II I I I I I I I I I I I II I II I I I I I I I I I I I I I I II 
DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 
I I M I I I I I t I I I I I I I I I I I I I I I I I I I I : I I I t I I I I I I I I I I I I II I I ) I I I I 1 : I I 
EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 261 
I I I I I I I I I t I I I I I I I i t t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I 
VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 
I I I I I t I I i I I I I I : I I I I I II I I I I I I I I I : 1 I I : I I I I : I I I I I I I I I t I I I I I I I I 
SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTT^SSQNAQGSSTYLDRMAGIYITGK 371 



EKGV 
I I II 

EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 



325 



431 



An ORFl 15ng nucleotide sequence <SEQ ID 5 13> was predicted to encode a protein having amino 



40 acid sequence <SEQ ID 5 14>: 



45 



50 



55 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
K* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



Further work revealed the following partial gonococcal DNA sequence <SEQ ID 515>: 



60 



1 TTGCTTGTGC 

51 CGAGAAGAAA 

101 CGCGTCGTAA 

151 TTGCCGGAGG 

201 ATCGCATAGC 

251 TGCCACAAAG 



/\AACAGAAAA 
GTCTTCAGCG 
AGGACATGAT 
AAATCACACG 
AAAGCATTAA 
TAACCGGGAT 



AGACGGTTTG 
AAAATGGTAA 
GAAACAGGGC 
CGACATTTCA 
GCCGTCATGC 
AATATCCGTA 



CATAACGAGC 
GTTGCACAAC 
ATCGTGAACA 
CTGGGTTCAT 
GCCCAGCCAA 
CTGCGAAAAG 



AAACCTTTGG 
TACTGGCGTG 
AAATTATACT 
TTGCCTATGA 
GGCACTGAGT 
CAACGGTATT 
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301 TCGCTACCCT ATACGCCCM TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORFl 15ng-l>: 



1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

This gonococcal protein (0RF115ng-l) shows 91.9% identity with 0RF115 over 334aa: 



20 30 40 50 60 70 

orf 115ng-l .p NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

III I I I I t I I : I I t I : II t I I II H I I I 
orf 115 STGHSEQNYTLPREITEINISLGSFAYESHRK 

10 20 30 



80 90 100 110 120 130 

orf 115ng-l . p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
M I : I t M I I II t I I II I I I I I I I I I I i I tl : I I M I I I I : I I I n I I I 

orf 115 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 
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140 150 160 170 180 190 

orf 115ng-l .p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
I I I I I I i I I I M I I t I I I I I I I I I I I I I I I I I I I I I ! I I I I I I M t I I I I I I I I I I I I t 
orf 115 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
90 100 110 120 130 140 



10 



200 210 220 230 240 250 

orf 115ng-l .p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 
lltlllIIIIMIIIIIIMillllllll|:tlllllllll[llll[tlllllllll:ll 
orf 1 1 5 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDTVWLVQKEVKLPDGGTQTVLVPQ 
150 160 170 180 190 200 



15 



260 270 280 290 300 310 

orf 115ng-l .p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
IIIIIIII illlfllllMllilllltllllllllllMllllllilllllllMMIi 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

210 220 230 240 250 260 



320 330 340 350 360 370 

20 orf 115ng-l .p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 

llll)lllllllll:tlllllltllllllll:ll|: I I t I : I I I I I I I I I I t I I I I I t I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 
270 280 290 300 .310 320 

25 380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 
MM 

orf 115 EKGV 

In addition, it shows homology with a secreted Kmeningitidis protein in the database; 

30 gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 

= 2273 

Score = 604 bits (1541), Expect ^ e-172 

Identities = 325/678 (47%), Positives - 449/678 (65%), Gaps = 22/678 (3%) 

35 Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 7 96 



40 



Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 



45 



Query : 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 



50 



55 



Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVPCNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G lAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

^N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 



60 



Query: 360 LDRMAGIYITGKEKGVLAAQAGKDrNIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY-fTG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 



65 



Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIBCV 1198 



70 



Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 
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Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

Query: 659 QTYEQKGLTVAFSSPVTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAISVPWN 1396 

Based on this analysis, it is predicted that the proteins from ^meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 62 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ED 517>: 

1 ..TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

201 CCAAAGCAGC ACCTTTGAAG GCAAGC/VAGT TGTATTGCAG GCAGGAAACG 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

401 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

451 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 

501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 

551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG.CTAAC 

651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

This corresponds to the amino acid sequence <SEQ ED 518; ORFl 17>: 



1 ..SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of Kmeningitidis (accession number AF030941) 
ORFl 17 and pspA protein show 45% aa identity in 224aa overlap: 

0rfll7: 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 

++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
pspA: 1173 DIRIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 

Orfll7: 64 HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orfll7: 124 QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
pspA: 1293 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESWGSLNGNTLISAGKHYTQTGSTISS 1352 

Orfll7: 183 PEGNNTIYAQSIDIQAAHNKLNSNTTQTYEQKXLTVAFSSPVTD 226 

P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 
pspA: 1353 PQGDVGISSGKISIDAAQNRYSQESKQVYEQKGVTVAISVPWN 1396 
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Homology with a predicted ORF from N. gonorrhoeae 

0RF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORFllVng) from 
N. gonorrhoeae: 



orfllV.pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

I I I I I I I I I I I I : I I : I t I t I t : [ I t : I I 
orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITIS 480 

orf 117 . pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 

:ll:: : I 1 I I I I I I I t I I I I I I I I I I I I M i I I I I 1 t t I I I I I I I I I I I I I M I I I I I I 
orfllVng SGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 540 

orf 117. pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

I I I I I I I I: I I I I I I I I I t I I I I I I M I I I I I I I I I I I I I [ I I I i I t I I I I I I t t t t I I I 
orfll7ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 . pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

I I I [ I I I I I I I M [ I I I I I : I ) I I I il:ltltlitl I I I : t : I I I : I I I I 

orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

orf 117 .pep YEQKXLTVAFSSPVTDLAQQ 230 
Mil I II I t I I II I M I II 

or f 1 1 7ng . YEQKGLTVAFSS PVTDLACX?AIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 720 

An ORFl 17ng nucleotide sequence <SEQ ID 519> was predicted to encode a protein having amino 
acid sequence <SEQ ID 520>: 



1 . . LLVQTEKDGL HNEQTFGEKK VFSENGBaHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTKEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGST^GTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following gonococcal partial DNA sequence <SEQ ID 521>: 



1 TTGCTTGTGC AAACAGAAAA 

51 CGAGAAGAAA GTCTTCAGCG 

101 CGCGTCGTM AGGACATGAT 

151 TTGCCGGAGG AAATCACACG 

201 ATCGCATAGC AAAGCATT/^ 

251 TGCCACAAAG TAACCGGGAT 

301 TCGCTACCCT ATACGCCCAA 

351 ATACATTATC AATCCTGCCA 

401 GCTTTGCCAA CTACCGTCAA 

451 CTCAAACTAG ACCCAAACAA 

501 CGAGCAACGT TTAATCAATG 

551 GTTTAGACGG TTATCAAAAC 

601 AATGGCGCGA CTGCGGCACG 

651 AAGTGCCGAG CAAGCAGCGC 

701 AAAAAGAAGT TAAACTTCCT 

751 CAGGTTTATG TACGCGTTAA 

801 GTTGTCAGGC AGCAATACAC 

851 CAGGCACGAT TGCAGGGCGC 

901 GACAATATCG GTGGGCGTAT 

951 ACAAGACATC AATAATATTG 

1001 TGCTCAATGC GGGTAACAAC 

1051 CAAAATGCAC AAGGTAGCAG 



AGACGGTTTG CATAACGAGC AAACCTTTGG 
AAAATGGTAA GTTGCACAAC TACTGGCGTG 
GAAACAGGGC ATCGTGAACA AAATTATACT 
CGACATTTCA CTGGGTTCAT TTGCCTATGA 
GCCGTCATGC GCCCAGCCAA GGCACTGAGT 
AATATCCGTA CTGCGAAAAG CAACGGTATT 
TTCTTTTACC CCATTACCCG GCAGCAGCTT 
ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 
TGGTTGGGTA GTGACTATAT GCTGGGCAGC 
TTTACATAAA CGTTTGGGTG ATGGTTATTA 
AACAAATCGC AGAGCTGACA GGGCATCGTC 
GACGAAGAAC AATTTAAAGC CTTAATGGAT 
TTCGATGAAT CTCAGCGTTG GCATTGCATT 
AACTGACCAG CGATATTGTT TGGTTGGTAC 
GATGGCGGCA CACAAACCGT ATTGATGCCA 
AAATGGCGGC ATAGACGGTA AAGGTGCATT 
AAATCAATGT TTCAGGCAGC CTGAAAAACT 
AATGCGCTTA TTATCAATAC CGATACGCTA 
TCATGCGCAA AAATCAGCGG TTACGGCCAC 
GCGGCATTCT TTCTGCCGAA CAGACATTAT 
ATCAACAACC AAAGCACGGC CAAGAGCAGT 
CACCTACCTA GACCGAATGG CAGGTATTTA 
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1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT C7VTCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORFl 17ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNXGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLXS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAK7VHK T* 

ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N, meningitidis protein in the database: 



gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length = 
2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives - 449/678 (65%), Gaps = 22/678 (3%) 



Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sb jet : 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 12i NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHBCRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G lAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 



Query: 



300 



LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 



wo 99/24578 



PCT/IB98/0166S 



-307- 



+N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 107 9 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query : 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 



SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 
SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGHT GSK +TQ N+S 
1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 



Query: 540 
Sbjct: 
Query: 599 
Sbjct: 
Query: 659 
Sbjct: 

Based on this analysis, it is predicted that the proteins from N.meningitidis and N.gonorrhoeaey and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 
++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS-*- + I AAQN+ + ++ 
1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

QTYEQKGLTVAFSSPVTD 676 
Q YEQKG+TVA S PV + 
1379 QVYEQKGVTVAISVPWN 1396 



Example 63 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 523>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT... 

This corresponds to the amino acid sequence <SEQ ID 524; ORFl 19>: 



1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGKSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 



1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 



wo 99/24578 



-309- 



PCT/IB98/01d65 



651 CTATCAGGCA TTTATCGTGG 

701 CCTCGCAGGA AGAACTCTCC 

751 CACAGCATGG GCGGTCAGAC 

801 AGTGGCTTCC GCACTGGACG 

851 CCATCCATTT GGTTTCCCCG 

901 GCCGTAACGG GCGTGGGTTT 

951 TACCGACACG TCGGGCTCGA 

1001 AGCCGTTTAC CAATGCCCTT 

1051 ATGCTGCTCG ACATCCCGCA 

1101 TTTGTTTATG GATTTGGCGG 

1151 TGGTCAACGA CAAAATGGAA 

1201 CGCACTTATG TATTGGCTCG 

1251 ACCGGGCGGC AAAACCGCAT 

This encodes a protein having amino acic 



GTATTCAGGC AGTCAGCCGC AACGGACTTG 
GCATTCAACC GCCAGGTGGA TGCATTCGCA 
GCTGCACACC GACCTTGCCG CCTTTATCGA 
CATTCTGCGC GCGCGTCGAC CAGACTATCG 
ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 
CGTTTTGGAA GACGACGGCG CGTTCCACTA 
CCATGTTCTC CATCTGCTCG CTCAACAACG 
TTGGACAACC AGTCCTATAA AGGCTTCAGT 
CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 
TACGCCTGTC CGGCCAGTTG AACCTGAATC 
GAAGTTTCGA CCCAATGGCT CAAAGACGTG 
TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 
TGCGCCTGTT CTCCTAA 

sequence <SEQ ID 528>: 



1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORFl 19a and ORFl 19-1 show 98.6% identity m 428 aa overlap: 



10 20 30 40 50 60 

orf 119a . pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I I I I i I I i I : M It M it I I I I I I t I I I I t M i I 1 I I I I t i I I I I I I [ I I I It M I t t I 
orfll9-l MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 119a . pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

1 1 1 II 1 1 1 i 1 1 1 1 II I ) 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 n t t I I I I I t I 

or fl 1 9 - 1 MPKPQPAVKKTAKPQDPTy^RNLQEQDAVYI AKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 119a . pep TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
II I t t I I I M I I I I I I I I I I : t I I M I I II It I I If I i I I M I I I I I I I I I t I I t I I i I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 119a . pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
II I I I I II I II II I I I I ) t t I I I I I I I t 1 I I t t I I t II II I t I I I I I I I t I I I I t I II II 
orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 119a . pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
I 1 I I I t I I I I : I I I I t I I I I I I I I I I I I I I I I I I I I I t I I II t II I I I I I I t I I I I I I I I 
orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 119a . pep AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
II i t I I t I I I I I I I II I I t I I I I I I I I I I I I II I I I I I I I I t I I i t I I I I I I I I I I I I I I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 119a . pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
I I I I I t I I 1 I t I I I I I I I I I I I I I I I I I I I I I t I I I I I I I t I 11 I I I I I I I I I t I I I I I I 
orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 



429 
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651 GTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CA7VAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; ORFl 19-1>: 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDD1.FM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF fix>m N.menineitidis (strain A) 

ORFl 19 shows 93.7% identity over a 175aa overlap with an ORF (ORFl 19a) from strain A oiK 
meningitidis: 



10 20 30 40 50 60 

orf 119 . pep MIYIVLFLAWLAWAYNMYQENQYRBCKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
I I [ I I I I M : I i I I I I I M I I I t I t I I I I I I I I It I ! I I I t I I I I I i I I I I I I I I I II 
orf 119a MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 119 . pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
I I I I I I I M I I I I Ml I I I I I I M II It I I I M It II II I I I I I I I I i II I II 1 II M 
orf 119a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 



130 140 150 160 170 

orf 119 . pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 
I i t I I I i I t I I i I t I It I : I I { 1 It I t 1 II 1 I 1 I II M f I t I t I 1 : I II I I 
orf 119a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

The complete length ORFl 19a nucleotide sequence <SEQ ID 527> is: 



1 ATGATTTACA TCGTACTGTT 

51 CAATATGTAT CAGGAAAACC 

101 GGCACTCCGA CAAAGATGCC 

151 GACGGCAAAC CGTCCGGCGG 

201 GGTCAAAAAA ACGGCAAAAT 

251 AGCAGGATGC CGTCTACATC 

301 TTCAAAACCG AAATCGAAAC 

351 CTCCGCCCAC ACCGTTCCCG 

401 CTGCCGACGC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA 

501 CGTGCGCTTC GACTTCATCT 

551 TGCACGCACT GCCGCGCCTT 

601 TGCACCATGG ACGACCATTT 



CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 
AATACCGCAA AAAAGTGCGC GACCAGTTCG 
CTGCTCAACA GCAA7UVCCAG CCATGTCCGC 
GCCAGTCATG ATGCCGAAAC CCCAACCGGC 
CCCAAGACCC CGCCATGCGC AACCTGCAAG 
GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 
CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 
AACCCCAAAC CGGACATTCC GCACCAAAAC 
CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 
GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 
CTTATATCGC GCTGACCGAA GCCAAAGAAC 
TCCAACCGCT GCCGCTACCA GATTGTCGGC 
CCAGATTGCC GAACCCATCC CGGGCATCCG 
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orfll9a.pep KTTVLRLFSX 
I M i i I I I I 
orfll9-l KTALRLFSX 



Homology with a predicted ORF fix>m N.sonorrhoeae 

0RF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (0RF119ng) from 
N.gonorrhoeae: 



orfll9,pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 60 

I I I I I I I I I : t I I I I I 1 I I I I I I I I i I ) I I I I I i I i I i I I I t I i I I I I I I I I I I I I It 
orfll9ng MIYIVLFIJ^VIAWAYNMYQENQYRKKVRI)QFGHSDKDALLNSKTSHVRDGKPSGGPVM 60 

orfll9.pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 120 

I I 1 f I I M I I I I I I I I I I M M I t I I I i I I I I I I I I I I I { I M I I i I t I I I I I I I I 
orf 11 9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 

orfll9.pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 175 

Miiiiiiiii inn II I: I II I n 1 1 1 1 1 1 1 1 1 It 1 1 II iitihiiiit 

orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 

The complete length ORFl 19ng nucleotide sequence <SEQ ID 529> is: 



1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT tccAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

This encodes a protein having amino acid sequence <SEQ ID 530>: 



1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

0RF119ng andORF119-l show 98.4% identity over 428 aa overlap: 



10 20 30 40 50 60 

orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

iiiiiiitt:innitt{ttiititiiittiiniitiiiitiitiiiiittiiit ii 

orf 119-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 
I I I I I I I 1 I I Mill I II I I II I I I II II I II II II II I I I i II I I I 11 I III t III 
5 orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGI IGNSAH 

70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
I I I I I I I II t I I II I ) I I M I : I I I I I i I I 1 I I I II I I I I I I I I I I I II I I I I II t I t I I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



15 



20 



25 



30 



190 200 210 220 230 240 

orfll9ng AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
I i I I I II I i I II I I I II I I I I I I I II II i I 11 I I I I I I II I I I I II I I I I I I I I 1 I II i i 
orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

0rfll9ng AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
MM I: II I I lllll II lilt nil III llllll II I II nil [III MM INI Mill 
orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orfll9ng AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
I I I I I I I I II M I I I M I I II I I I I II I n I I I I I I I I I I I I I i I II I I M I II I I I II I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

310 320 330 340 350 360 



35 



370 380 390 400 410 420 

orfll9ng GEKTFDDLmOLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
M II M I II II I I I I II I I I II i I II I II II I M I M I I II I i I I I I I I I II I I II I I M 
orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
370 380 390 400 410 420 



40 



orf 119ng 
orfll9-l 



429 

KTALRLFSX 
M II M i I I 
KTALRLFSX 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from ^meningitidis and N.gonorrhoeaey and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 Example 64 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 531: 

1 ..GCGCGGCACG GCACGGAAGA TTTCTTCATG AACAACAGCG ACAC.ATCAG 

51 GCAGATAGTC GAAAGCACCA CCGGTACGAT GAAGCTGCTG ATTTCCTCCA 

101 TCGCCCTGAT TTCATTGGTA GTCGGCGGCA TCGGCGTGAT GAACATCATG 

50 151 CTGGTGTCCG TTACCGAGCG CACCAAAGAA ATCGGCATAC GGATGGCAAT 

201 CGGCGCGCGG CGCGGCAATA TTTyGCAGCA GTTTTTGATT GAGGCGGTGT 

251 TAATCTGCGT CATCGGCGGT TTGGTCGGCG TGGGTTTGTC CGCCGCCGTC 

301 AGCCTCGTGT TCAATCATTT TGTAACCGAC TTCCCGATGG ACATTTCCGC 

351 CATGTCCGTC ATCGGCGCGG TCGCCTGTTC GACCGGAATC GGCATCGCGT 

55 401 TCGGCTTTAT GCCTGCCAAT AAAGCAGCCA AACTCAATCC GATAGACGCA 

451 TTGGCACAGG ATTGA 

This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 



1 . .ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 LVSVTERTKE IGIRMAIGAR RGNIXQQFLX EAVLICVIGG LVGVGLSAAV 

60 101 SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 

151 LAQD* 
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Fxirther woric revealed the complete nucleotide sequence <SEQ ID 533>: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAMCCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

1151 CATTGGCACA GGATTGA 

This corresponds to the amino acid sequence <SEQ ID 534; ORF134-I>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MK LLISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ QFLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o648 of Kcoli (accession number AE000189) 
ORF134 and o648 protein show 45% aa identity in I53aa overlap: 

Orfl34: 2 RHGTEDFFMNNSDXIRQIVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEI 61 

RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EI 
o648: 496 RHGKKDFFTWNMDGVIJCTVEKTTRTLQLFLTLVAVISLVVGGIGVMNIMLVSVTERTREI 555 

Orfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

o648: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 

Orfl34: 122 GAVACSTGIGIAFGFMPANKAAKLNPIDALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 

Homology with a predicted ORF from Kmenin^tidis (strain A") 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A ofK 
meningiHdis: 

10 20 30 

orfl34 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

I I I t I I I [ I I M I I I I I t t I t I I I ! I I t I 
orfl34a GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
210 220 230 240 250 260 

40 50 60 70 80 90 
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orf 134 . pep - ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 
I I I I I I M M i I I I I I I I i I i I I I I I I I I i I I I I I I I I I I M I t I I I I M I I I I I I I I t 
orf 134a ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 280 290 300 310 320 



100 110 120 130 140 150 

orf 134. pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I M I i t I I I I I I I M I I I i I I I I I I I I I I I i I I I I I I I I I I I I I I I I I t M I I I I i I I I I 
orf 134a LVGVGLS7\AVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



orf 134. pep LAQDX 
n I I I 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCATTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC AGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 536>: 



1 

51 
101 
151 
201 
251 
301 
351 



MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK 



NTISIFPGRG FGDRRSGRIK TLTIDDAKII 
YRNTDLTASL YGVGEQYFDV RGLKLETGRL 
DKLFADSDPL GKTILFRKRP LTVIGVMKKD 
HQITGESHTN SITVKIKDNA NTQVAEKGLT 
RQIVESTTGT MK LLISSIAL ISLWGGIGV 
IGARRGNILQ QFLIEAVLIC VIGGLVGV GL 
AMSVIGAVAC STGIGIAFGF MPANKAAKLN 



AKQSYVASAT 
FDENDVKEDA 
ENAFGNSDVL 
DLLKARHGTE 
MNIMLVSVTE 
SAAVSLVFNH 
PIDALAQD* 



ILEDISSIGT 
PMTSSGGTLT 
QVWIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 



orf 134a . pep MSVQAVLAHECMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 
tllilttllllllllMltlllMlllllttlllMlllltllllillllllMlilill 
orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 



orf 134a . pep FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
I I I I I I I I i i I I [ I I I M I I I I I I I I I I I I I I I I I I t I i I I I I I I I i I 1 I i t ) i I I I I I i 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 



orf 134a . pep RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 
t I I I t I I I M I t I I I I I 1 I t t I I I I t I I 1 I I I I t I i i I I I It I I ! I I I I I I I i I I I I I I I 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 



orf 134a. pep 
orfl34-l 



ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
tt I I i I I I I I I I I I M I I I I I I I I I I I I I I i I I I I I I M I I t t t I I I I I 1 I I I t I t I I I I 
ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
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orf 134a. pep 
orfl34-l 
orf 134a .pep 
orfl34-l 
orf 134a. pep 
orfl34-l 



DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
I I I 1 I i I I t I I I I I I ! M I I I I I I I I I I I I I I I I t I I 1 I I I I i I I t M I I M i I I I M I t 
DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTECEIGIRMA 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 
I I I I I I I I i I I I I I I I I t t I I I I I t i I I 1 I I I t I I I I I I t I I I M i I I I I t M I M I I i I 
IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

STG I GI AFGFMPANKAAKLN P I DALAQDX 
1 I I I I I i I t I M I I I I I I 1 I I I t I I t I I I 
STGIGIAFGFMPANKAAKLN P I DALAQDX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overly with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

orf 134. pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 

I I I t I i I t I I I I I I IM:||llllllill 
orfl34ng GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 

orf 134 .pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 

I I I i I I I I t I t M I I t I t I I I I t I i I I I I t I M I t I I I I I [ I I t 11111111111:111 
orfl34ng ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I I I M I I I I I I I I I t I I I It I I I I I I I I I I I i I I t I M M 1 t I I I I I I 1 I I I I I I I I I t 
orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 



orf 134 .pep 
orfl34ng 



LAQD 
I I I I 
LAQD 



154 
388 



The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 



1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACCAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCGCTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTCGAAG ACATCAGTTC GATGGGGACG 

151 AACACCATCA GCATCTTCCC CGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAAAATCAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC CTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACC 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGATGAGA 

401 ACGATGTGAA AGAAGACGCG CAAGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAT^WUVGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 

701 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 

751 AGGCAGATGG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 538>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDXSSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MK LLISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ QFLIEAVLIC IIGGLVGVGL SAAVSLVFNH FVTDFPMDIS 
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351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

orfl34ng MSVQAVLAHKMRSLLTMLGIIIGIASWSVVALGNGSQKKILEDISSMGTNTISIFPGRG 
I I I I I I I I I I I I I I I I I t I I I I I I I I I M I I I I I I I I f I t I 1 I I [ I I : i I t I i I I I I I I I 
orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orfl34ng EX3DRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
tillllhiMIIIIIMIIIIIIIIIIIIIIMIMlMtlMIIIMINIttlilll 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orfl34ng RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 
I I 1 I I i I M i I [ I I I t I I t I I M I I ] I i I I I I I I It I I I I i 1 I t I I I I I i I I I I I I I i I I 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVI6VMKKD 



15 orfl34ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 

liltlllllllltilittlilMIIII[IIIMIIIII)lll:|ll)II::lilllllll 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

orfl34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
20 i t i I I t I I I I t I : I I I It M i I I I t M [ I I I i I I I I I t I I I I I I I t I I 1 I I I I I 1 I I i I I 

orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orfl34ng IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 

I M I I I I I I I I I t II I t i I I: I I I I I I I I I II I I I 1 I I I I I 1 I I I I I I I I I II I t I I II 
25 orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orfl34ng STGIGIAFGFMPANKAAKLNPIDALAQDX 
I I I I I I I t I II I I i II II I I II t I I t I I I 
orfl34-l STGI GI AFGFMPANKAAKLNPI DALAQDX 

30 ORF134ng also shows homology to znE.coli ABC transporter: 

sp|P75831|YBJZ_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >gi5 
(AE000189) o648; similar to YBBA_HAEIN SW: P45247 [Escherichia coli] Length = 
648 

Score = 297 bits (753), Expect = 6e-80 
35 Identities = 162/389 (41%), Positives - 230/389 (58%), Gaps = 1/389 (0%) 

MSVQAVLAHECMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKILEDISSMGTNTISIFPGRG 60 
M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI ++PG+ 



40 



45 



55 



60 



Query: 


1 


Sbjct: 


260 


Query: 


61 


Sbjct: 


320 


Query: 


121 


Sbjct: 


380 


Query: 


180 


Sbjct: 


440 


Query: 


240 


Sbjct: 


500 


Query: 


300 


Sbjct: 


560 


Query: 


360 


Sbjct: 


620 



FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 120 
FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 



50 ++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ 7VE+ L LL RHG 



+DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EIGIRM 



A+GAR ++LQQFLIE F+ + + S +++ A 



CST GI FG++PA AA+L+P+DALA++ 



wo 99/24578 



-316- 



PCT/IB98/01665 



Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N.meningitidis and N.gonorrhoeae^ 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 539>: 

1 ..GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T.CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

201 CAGGGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 

251 CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

351 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

401 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 

451 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 

501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 

551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT GCATCATCAT CCTCAGCGGT 

601 ATTTTGA 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 



1 ..GrGAMiLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGWL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence <SEQ ID 541>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

201 GCCCCATTGG ATWU^CCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

Computer analysis of this amino acid sequence gave the foUoAving results: 
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Homology with a predicted ORF from N Meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overly with an ORF (ORF135a) from strain A oiN. 
meningitidis: 



orfl35.pep 



orf 135a 



orf 135.pep 



orfl35a 



orf 135, pep 



orf 135a 



orf 135 .pep 
orfl35a 



orf 135a 



10 20 30 

GTGAMLLLFYAVT I LPLATGVTLS YTSS I F 
I I I I I I I I I i I I ) I I It M I I ) I I I I I I I 
STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 
50 60 70 80 90 100 

40 50 60 70 80 90 

LAVFS FLI LKER I S VYTQAVLLLGFAG WLLLN PS FRSGQETAALAGLAGGAMSGWAYLK 
I l-l M I I I I I I I f I i I I I I [ i I I M I I I I I I I I I I M t I i I I I I I I I I I I 1 I I I I I I M t 
LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
110 120 130 140 150 160 

100 110 120 130 140 150 

VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 

1 1 1 1 1 1 1 1 i 1 M 1 1 1 1 n 1 1 i 1 1 1 i 1 1 1 1 1 1 1 1 1 [ i 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 

160 170 180 190 200 

TRAYBCVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVFX 
t t I I I I I I I I I I I I I I I I I i t I I I I I I I I I I : I I I I I t I I i t I I i I I 
TRAYKVGDKFTVASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 
230 240 250 260 270 280 

KQRLQSLFRQEOC 
290 300 



The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGGATACCG 
GGCGGCCTGC 
AATTTGCCCT 
ACCGTTGCGC 
GCCCCATTGG 
TGCTGCTGCT 
ACCCTGAGTT 
TTTGAAAGAA 
TTGCCGGCGT 
ACGGCGGCAC 
TTTGAAAGTG 
TGTTTTACCT 
CTGACCGGCT 
CATCGGCGTG 
AAGTCGGCGA 
TTTTCCGCTC 
GGAAATACTC 
TCCGCCCCAC 
TAA 



CAAAAAAAGA 
TTTACCATTA 
CGGCAGCGGC 
TCGGGGCTGC 
AAAAACCACT 
GTTTTACGCG 
ACACCTCGTC 
CGGATTTCCG 
GGTATTGCTG 
TCGCCGGGCT 
CGCGAACTGT 
TTCCGTGACA 
GGCACACCCT 
TCCGCGCTGA 
CAAATTCACG 
TGTCTGCCGC 
GGTATGTGCA 
TGCCTTCAAA 



CATTTTAGGA 
TGAACGTATT 
GAATTGGTCT 
CGCCGTATTG 
TAAACCGCAG 
GTAACGCATC 
GATTTTTTTG 
TTTACACGCA 
CTTAATCCCT 
GGCGGGCGGC 
CTTTGGCGGG 
GGTGTGGCGA 
GTCCTTTCCA 
TTGCCCAACT 
GTTGCCTCGC 
ATTTTTTCTG 
TCATCATCCT 
CAGCGGCTGC 



TCGGGCTGGA 
GATTAAAGAG 
TTTGGCGCAT 
CGTCGGGACA 
TATGGTCGGG 
TGCCTTTGGC 
GCGGTATTTT 
GGCGGTGCTG 
CGTTCCGCAG 
GCGATGTCCG 
CGAACCCGGC 
TGTCATCGGT 
TCGGCAGTTT 
GTCGATGACG 
TTTCCTATAT 
GCCGAAGAGC 
CAGCGGTATT 
AATCCCTGTT 



TGCTGGTGGC 
GCATCGGCAA 
GCTGTTTTCA 
CCTTCCGCAC 
ACGGGGGCGA 
CACCGGCGTT 
CCTTCCTGAT 
CTCCTTGGTT 
CGGTCAGGAA 
GCTGGGCGTA 
TGGCGCGTCG 
TTGGGCGACG 
ATCTGTCGTG 
CGCGCCTACA 
GACCGTCGTT 
TTTTCTGGCA 
TTGAGCAGCA 
CCGCCAAAGA 



This encodes a protein having amino acid sequence <SEQ ID 544>: 



1 

51 
101 
151 
201 
251 
301 



MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 
TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 
T LSYTSSIFL AVFSFLILK E RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 
TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 
LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 
FSALSAAFFL AEELFWQEIL GMCIIILSGI LSSIRPTAFK QRLQSLFRQR 



ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



orf 135a . pep MDTAKKDILG SGWMLVAAAC FTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i n I I t I I I i I i I I I I t I I I I I I I I I 

orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
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orf 135a . pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 
I I I : I I I I I 1 I t I I I I I I t I I I I I I I M I I I I I M M i i t I t M It I I I I t M t I I I t t I 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 



orf 135a . pep RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 
I I [ I t I I I i I I I I M I I M I M t I I I I I I I I I I i I I I M I I I I I I I I I I I t I I I I I I I I I 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 



orf 135a . pep WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 
i I I I I I I 1 I I I I I I I I i I I I I I I 1! I t I I I I I I I I I I M t I I I I M i I i I I I I M I I M I 
orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 



orf 135a . pep VASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 
I I I I I I ) I I [ t I I M I I I I I : I I I I I I I t t I t I I ) I t I I I I I I I I I I I I I I I I I I ) I i t I 
orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 



Homology with a predicted ORF from gonorrhoeae 

ORF135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 
gonorrhoeae'. 



orf 135. pep 
orfl35ng 
orf 135. pep 
orf 135ng 
orf 135. pep 
orf 135ng 
orf 135 .pep 
orf 135ng 



GTGAMLLLFYAVTXLPLATGVTLSYTSS I F 30 
I t I I I I I I I I I M i i t : I I ) I I I i I I I I 1 
STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 335 

LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 90 
I I I I I i t I t M I I I t I M I I I I I I I I M I I I I M I I I I [ I I I t M I t [ I I I I I I i I I I I 
LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLK 3 95 

VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 
I i I I I I I I I i I M M I t I I : I I I I I I I I I t I I I M I I I I I I t It I I i I II I I I I I II t t 
VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 455 

TRAYKVGDKFTVASLS YMT WFSALSAAFFLGEELFWQEILGMCI IISAVF 201 
I II i M I 1) I M I I I II I II I i I I I I I I I I II II I I I I I I I I I 1 I I II I : I 
TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEI LGMCI I ISAAF 506 



An ORF135ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having amino 



acid sequence <SEQ ID 546>: 



1 MPSEKAFRRH LRTASFQGLH LHHFHQKVGK CGIIGFGIHI FPTLLPAA QG 

51 ILDIQLGLFR IDFAALAVYR RTQVDFIHTV IDGIASDQAF SEWQILRRL 

101 NLGHFTDTHL lAQARRFIAD FGNIRPMRRG EAKTFCRCFR FDGIDGIHGD 

151 FRQCGHINRL APGKDCRNGK RDKVFFHTEIH YNQVCLEKTN CSARKIKFRH 

. 201 QKQAKTHSTS LAARFTIRPS LSQRPFMDTA KKDILGS GWM LVAAACFTVM 

251 NVLI PCEASAK FALGSGELVF WRMLFSTVTL GAAAVLRRDT FRTPHWKNHL 

301 NRS MVGTGj\M LLLFYAVTHL PLTTGVT LSY TSSIFLAVFS FLIL KERISV 

351 YTQ AVLLLGF AGWLLLNPS F RSGQEPAAL AGLAGGAMSG WAYLKVRELS 

401 LAGEPGWRW FYLSATGVAM SSVWATLTGW HTLS FPSAVY LSGIGVSALI 

451 AQLSMTRAYK VGDKFTVAS L SYMTWFSAL SAAFFL GEE L FWQEILGMCI 

501 I ISAAF* 



Further work revealed the following gonococcal sequence <SEQ ID 547>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGM CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 



1 MDTAICKDILG SGWMLVAAA C ETVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 



orf 135ng-l . pep MDTAKKDILGSGWMLVAAACFTVMNVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 
I I I I I I t I I t I I I I I I t I I M I : I I I I I n M I I I I I I I I I I I I i I I t I I I I : I I i I t I I 
orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 



orfl35ng-l.pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 
I I t : I t i I { t I I I I I I I i I I t I t I I I I I I I I I I I 11: i t I I I I I t I I i n I t I I I I I I I I 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 



orf 135ng-l . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 

orfl35-l RI S VYTQAVLLLGFAG WLLLN PS FRSGQETAALAGLAGGAMSGWAYLKVRELSLAGE PG 



orf 135ng-l . pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 
I I I t I I I I : I I I I I i I i I I I I I I I I I I M M I M t I I I I i t I I i I I I I I I I t 1 I I I I I I 
orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 



orf I35ng-1 . pep VASLSYMTWFSALS7VAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQALFRQR 

1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 ] f 1 1 1 1 1 1 1 1 i I n I I I I I I I I I t I I I I I I I M I : I I I i I 

orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N.meningitidis and N, gonorrhoeae^ and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 66 



The following DNA sequence was identified in N.meningitidis <SEQ ID 549>: 

1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

401 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

4 51 TTTGACCATG GCAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

501 AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCATATCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC mmrwyATTCG 

701 GAATAG 

This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 



1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 
51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 



wo 99/24578 



-320- 



PCT/IB98/01665 



101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ID 552; ORF136-l>: 

1 MMKR RIAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYIIN DGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF136 shows 71 .7% identity over a 237aa overlap with an ORF (ORF136a) from strain A ofK 
meningitidis: 

10 20 30 40 50 59 

orfl36.pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
IIMIIMIt: I I I : I I t I I t M t I t I I I I I I I 1 I I I i I i I I t I I I I I t I I I I I t I 
orfl36a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 136 . pep PCGIVFGALL FRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
1111111:11111 : I I t I I I I i t I : I I I I I I I I I I I I M i It I M I I I I I t I I I I i 1 
orf 136a PCG IV FGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFD IGQFAGFIVQ 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 136 . pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
I : : i : I I I i I I I i t M I I I M I i I M I t I : : I : I : I : : : : 

orf 136a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
130 140 150 160 170 180 

180 190 200 210 220 230 

orf 136. pep AFVGTVYRFVCLFYIINDGIAHH S APQR VR YL FA PYCGFLP SASDSDLKSS XX SEX 

: I I : i : : : : I I I I I i I I M I I It I I I I I M I I t t I I I III 

orf 136a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF 136a nucleotide sequence <SEQ ID 553> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CMCTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 
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301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACGCCATAA ATGTAAAGAC CGTCAT^TA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 

451 GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

501 AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

551 CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 

601 CCCATCATAT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
M i t I I I I I I I : I I I : I I I I 1 I 11 M I I t I I I I I I I I I i I I I t I I t I I I I I I t I M I 
MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
10 20 30 40 50 60 

70 80 90 100 110 120 

PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 
I I t I I I I : I I I I i : I I I I I I I I I t : I I I I t I i I M I I I I I I I I I t I I I I I I I t t t t I I 
PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 
70 80 90 100 110 120 

130 140 150 160 170 180 

HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

|::|:||||||||IMf Milt IIIMII : :l : ): I :: : : 

HTVNIKTVKIKIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNT^AAFPKKLAPKIFECFTG 
130 140 150 160 170 180 

190 200 210 220 230 

R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

: I t : I : : : : I II I I I I I I I t I I I I II I I M I I I I I I I M I M 

AFVGTVYRFVCLFYIINDGIAHH SAPQRVR YLFAPYCGFL PSASDSDLKS SKY SEX 

190 200 210 220 230 

Homology with a predicted ORF from N.sonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
N.gonorrhoeae: 

orf 136 . pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 59 

I I I I I I I ) II : I II : n I I I I I 1 I II I I I 1 I t I I I I I I I I I II II : It 1 I I I I II t) 
orf 136ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 60 

orf 136 . pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 119 

I 11111:11111! II II I I I I I II II I II I I I I I II : I f I I I I II M i I I I I II I I 
orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 

orf 136 . pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 179 

I I I I li t I I I I I M I I I I II t I I II t I I I I I I I I II I II I I I i I I I II II I I I : I I I t I I 
0rfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 180 

orf 136 . pep AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 234 

I I : I t I I I I I I II I I I II n I II : I I I I I I I I I I II I II I II I I I I I I I II 
orfl36ng AFAGTVYRFVCLFY UN DGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKS SKYSE 235 

The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 
51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 



orfl36a.pep 
orfl36-l 

orf 136a . pep 
orfl36-l 

orf 136a . pep 
orfl36-l 

orf 136a .pep 
orfl36-l 
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101 GGATGCTCTT CCAAATTTTC 

151 TACCTGCCCG GGATCGCCGA 

201 TACGCTCCTC TTCCGTCATC 

251 CCGTAGGGGA TGCCGTTGCA 

301 CGGAACGCAA ACGCTTTCGC 

351 CATTGTTCAG CACACCGTAA 

401 ATCCACATAT GTTCGCAAAT 

451 GACTTTGACC ATGGCAAAAT 

501 AAAAAAGCTC GCGCCAAAAG 

551 GCACGGTTTA CCGGTTCGTC 

601 GCCCATCATA CTGCTCCTCA 

651 CGGTTTTCTA CCTCCGGCAT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

1 MMKR RIAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
M I I I I I I I I I : I t I : I I I t i i I t I M I I M I M I I I I I i I I n 1 I : I I I I I I I I tl I 
orf 1 3 6- 1 MMKRRIAVFVLFPQI IRVLGQLLPKIVNTVPAHRMLFQI FGMFFFFIHQQYLPGIAEIDS 

orfl36ng PGG I VFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFD IGQSAGFIVQ 

t t M I I : M I M I I I i I f I I I t I I I I I I I I I I t i I [ : I I t t I M I t I I I I I I I I I I I 
orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

orfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 

1 1 i I M 1 1 i 1 1 1 1 1 1 1 1 1 1 i 1 1 1 i n I I I I I I I I I i I I I I I I I I I I I I I I I I I : I I I I I I 

orf 136-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orfl36ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 
M : I I I I I I I I I I I M I I I I I I t: i I I I I i I I I I M I I I I I I I t I t I I 1 t I I I i 
orf 136-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from Kmeningitidis and Kgonorrhoeaey and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 GGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. . 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIECVLKEN GIPVKWTGT SAGSXVGNLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 559>: 



GGGATGTTCT TTTTCTTCAT ACACCGGCAA 
AATCGATTCC CCAGGCGGTA TCGTGTTCGG 
TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 
CACGAACATC CAGTCGCTGA TGTCGCCAAC 
CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 
ATATAAAGAC CGTCAT^AATA AATATCGTCG 
TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 
CCAAGGCGGA AATAATGCGG CGGCGTTCCC 
TATTTGAATG TTTTACGGGC GCGTTCGCCG 
TGCCTGTTCT ACATAATAAA TGACGGAATC 
ACGTGTACGG TATCTGTTTG CACCTTACCG 
CCGATTCGGA TTTGAAAAGT TCCAAATATT 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAMCCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORFI37-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 
151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 
201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 
251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 
301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menimitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 137 . pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
t I I i I t t I I I I I I I I I I I i I t I I I M I i t I : i t I I I I I I M I I M t i I M I I t I I I I I 
orf 137a MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 137, pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 
■ t I i I I I i I I I I I I I I I I I 1 I i I t I I I I : I I I I I I t I i I I M I I I I I i I I M I I I I I I I : I 
orf 137a VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

or f 137 . pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 
Mil I M I I M I I : I : I I I t I I I t I I 
orf 137a FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 
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601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 562>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FXKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



orf 137a . pep MENMVTFSKI RPLLAIAAAALLAACGTAGNNATUIKPVQTAKPAAVVGLALGGGASKGFAH 
t i I I I I I I I I t I I I t I n I I I I t I I t I I I I I I : I I t I I I i I I I t I M I I I t I I I I I I t i I 
orf 137-1 MENMVTFSKIRPLIAIAAAALLAACGTAGNNAVRKPVQTAKPAAVVGLALGGGASKGFAH 



orf 137a . pep VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGECTDLVDLTLSTSG 
t I I I I I I i I I I I I I I I I I I I I I I I I i I I I I I I I I I i I I t I I I I t I i I t I I i I I i i I I I I I 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 



orf 137a . pep FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
I I I I I I I I I I I I i I I I I I : I I I I I I I I t t t I I I I I t I I 1 i I I I I t i t I I I t I I I I I I I ! I 
or fl 3 7 - 1 FXKGEKLQNY INRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAI PNV 



orf 137a . pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 
I I I I I I I M I I I I I I I I I M It I I I i I I I I 1 I [ I I i I : t I I M t I t t 1 I M I I I i 

orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 



orf 137a . pep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

1 1 1 1 1 1 1 1 M I M 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 n I I I I I I i I i I t I I I [ I I I t I 

orf 137-1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 



Homology with a predicted ORF from N. gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N.gonorrhoeae: 

orf 137 . pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 60 

I I I I I I I I I I I : I ) M I t t M I I I t M M : I I I I ) I I I I I I i I: I M I I I i t I I I I I 
or f 1 37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 



or f 137 . pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 

:||:|ilttlltltllllilltlltll:|:ittt!llllltllltllllililltltl:l 
orfl37ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

or f 137 . pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 149 

i I t I I I I I I I I I I : t I i I I I I I I I 1 i 
orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ED 563> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGGAAAATA 
CGCCGCCGCG 
GCAAGCCGGT 
GGTGGCGGCG 
GAAAGAAAAC 
CGATAGTCGG 
TTGGAAGCCG 
CACCAGTGGT 
AAGTCGGCGG 
GCCACTGATT 



TGGTAACGTT 
TTGCTTGCCG 
GCAAACCGCC 
CATCTAAAGG 
GGTATTCCTG 
CAGCCTTTTG 
AGATTTTAGG 
TTTATCAAAG 
CAGGCAGATT 
TTGAAACCGG 



TTCAAAAATC 
CCTGCGGTAC 
AAACCCGCCG 
ATTTGCCCAT 
TGAAGGTGGT 
GCATCGGGTA 
TAAAACCGAT 
GCGAAAAGCT 
CAGCAGTTTC 
CAAGGCCGTC 



AGATCATTTT 
GGCGGGAAAC 
CAGTGGTCGC 
ATAGGAATTG 
TACCGGCACA 
TGTCGCCCGA 
TTAGTCGATT 
GCAAAATTAC 
CCATCAAATT 
GCTTTCAATC 



TGGCAATCGC 
AATGCCGCCC 
TTTGGCACTC 
TTAAGGTTTT 
TCGGCAGGTT 
CCGCCTCGAA 
TAACCTTGTC 
ATCAACCGAA 
TGCCGCCGTT 
AAGGGAATGC 
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501 
551 
601 
651 
701 
751 
801 
851 
901 



GGGGCAGGCG 
TCATCATCGG 
CCCGTCAGTG 
TATTTCCGCA 
TCGATCAGAC 
gggcAGGCGG 
CGGCGGATTC 
CACGTGCCGC 
TGA 



GTTCGTGCTT 
CAGGCACAAA 
CCGCTCGGCG 
CGTCCGAGCA 
GCTGAACGTG 
ATGTGGTTAT 
GATCAGAAAA 
ATTGCCTGAA 



CCGCCGCCAT 
TATGTTGACG 
GCAGGGGGCG 
AAAATGTCGG 
ATGAGCGTTT 
CAAACCGCag 
AGCGCGCCAT 
ATCAAACGCA 



TCCCAATGTG 
GCGGTCTGTC 
AATTTCGTGA 
TCAAGGTTTC 
CCGTGTTGCA 
gtTTTGGATT 
CCGGTTGGGC 
AACTGGCGGC 



TTCCAGCCAG 
GCAGCCCGTG 
TTGCCGTCGA 
TTCTCTTATC 
AAACGAGTTG 
TGGGTGCAGT 
GAGGAGGCAG 
ATACCGTTAT 



This encodes a protein having amino acid sequence <SEQ ID 564>: 



1 MENMVTFSKI_ 

51 GGGASKGFAH 

101 LEAEILGKTD 

151 ATDFETGKAV 

201 PVSAARRQGA 

251 GQADWIKPQ 

301 * 



RSFLAIAAAA LLAACGTAGN 



IGIVKVLKEN 
LVDLTLSTSG 
AFNQGNAGQA 
NFVIAVDISA 
VLDLGAVGGF 



GIPVKWTGT 
FIKGEKLQNY 
VRASAAIPNV 
RPSKNVGQGF 
DQKKRAIRLG 



NAARKPVQTA 
SAGSIVGSLL 
INRKVGGRQI 
FQPVIIGRHK 
FSYLDQTLNV 
EEAARAALPE 



KPAAWALAL 
ASGMSPDRLE 
QQFPIKFAAV 
YVDGGLSQPV 
MSVSVLQNEL 
IKRKLAAYRY 



ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 



orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 
I I I I t I I i I i I : I 1 M I t t I I I I I M I I M i : I t I I t It I M I t I : I i I I I I M t t I I 1 
orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orfl37ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
:|l:|||IM[lll!tlllMMIiMII:tlil[lllllllll[!lllliillllllM 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orfl37ng FIKGEKLQNY INRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i 1 1 1 1 1 1 1 i n 1 1 1 1 1 1 1 1 i I i 1 1 1 1 

orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orfl37ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 
M It I I I I I I I M I I I I II I If I 1 I t I I t i I I II I I I t II I : II :: I II I II I I M I t I 
orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orfl37ng MSVSVLQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

I I I I : II M I I I I I II M II I I II I I I I I [ I II I I I I II I I II I I I I It I II I I I I I I II 
orf 137 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N.meningitidis and 
Kgonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 



raising antibodies. 



Example 68 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC. . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 
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Further work revealed the complete nucleotide sequence <SEQ ID 567>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA 7VACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 AC7VCGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTM 

This corresponds to the amino acid sequence <SEQ ED 568; ORF138-l>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLEIAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 138 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

I I I I I I i 1 I I I I 1 I I I I I I i I I I I I I I I I I I I I I I I i I I I I I t I I i M I I I i I I I I I I I 
orf 138a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I I I I i i I i I I t I I I I I I I I I I I I I I I I I I I I I I I M t I I I M I I I [ I I I I I I I I I I I I 
orf 138a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

70 80 90 100 110 120 



orf 13 8. pep LLF 
orfl38a 



LLF 
IN 

LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ ID 569> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 
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601 GTCCCCTCCC CTCAAGAAGG 

651 CAAACCTGCC TATACCATGA 

701 GCGTGAAAAC CCTGTTTTTC 

751 TTCGATTTGC ACATCCGCCC 

801 CCATGATGCC GCCGTGTTCA 

851 TTCCGACGCA GTATCTGTTT 

This encodes a protein having amino acid 



CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 
CGCTGGCGGC AAAATTGGCA CACGTCAAAG 
TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 
CGTCCAAGGG GAATTGAACG GCGACAAAGC 
ACCGCAATGC CGAATATTGG ATACGCCGTT 
ATGTACAACC GCTACAAAAT GCCGTAA 

sequence <SEQ ID 570>: 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 



orf 138a . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
M I I I 1 I I I I t I I I I I I I I I I I I I i I I I I I t I I i I I I I I I I I I I I I I I I I I I I I t I I I I I 
orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLICEDRARIVAN 



orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I t I : I I i I i I I I I I I I I I t i I I I I I 1 t I I I I M I 1 I I I I t I I I I I I I I i I I I I I I I I I 
orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

I I I M 1 1 I t I I I 1 1 I I I 1 1 I i I I I I I I I t I 1 I I M 1 1 I I I I t i I I t I I I I 1 1 1 I I I I M I 

orfl38-l LLFIT PHIGS YDLGGRY I SQQLPFPLTAMYKPPKIKAI DKIMQAGRVRGKGKTAPTS IQG 



orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
t I I I I I t i I I I I I t t I I I I I t I I I I I I I I I I I I I I i I I I I I t I M t t I I t I I I I I I t I I I 
orf 138-1 VKQIIK7a.RSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 



orL'138a . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I I I I I I I I I I I I I M I I I I I I i I [ t I I t I I I I I I i I i M I i I I I I I I I I I I I i I I t I I 
orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

Homolo^ with a predicted ORF from N, gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 



N. gonorrhoeae: 

orf 138 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 60 

I M I i I I I I I M I I I I I I I I I I I t I I I 1 I 1 I I I I 1 I I I I I M 1 I I I I I I I I I I I ) I I I 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 60 

orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 120 

I ) t i I M M : t t I I I I M I I I I I t I I I I I I : I I I I I I M I I I I I i I I I I I I I I I I M 
orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 120 



orf 138. pep LLF 123 
ill 

orfl38ng LLFITPHIGS YDLGGRY I SQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTG IQG 180 

The complete length ORF138ng nucleotide sequence <SEQ ID 571> is: 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 

351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 

451 AAGCCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GTCAAACAAA 

551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 
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601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>; 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

10 101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 



15 



20 



25 



30 



orf 138-1. pep 
orf 138ng 
orfl38-l.pep 
orf 138ng 
orfl38-l.pep 
orfl38ng 
orf 138-1. pep 
orfl38ng 
orf 138-1 .pep 
orfl38ng 



MFRLQFRLFPPLRTTy^ILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

I I I 1 1 I I i I M I 1 1 I I I I i i I I 1 1 1 1 I I I I I i t I I 1 1 I It I t I I I i I I I 1 1 I I I I 1 1 1 I 

MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I I I I I I I : 1 I I I I I t I t I I I I I I I I M I : I i t I I I t I I i I I I I t I I I I I I I I I t I 
MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I I I I t I I I i I I I I M I I t I I I I I I I I I i I t I I I I I I i I I I M I 1 t I I I I I : I I I 
LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

VKQIIKAIJISGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
lllllllll:tllli:|||liltlllll I t t: M M i I I I I i I I M t I I I I I I I I t I t I 
VKQII KALRAGEAT 1 1 LPDHVPS PQEGG-GVWAD FFGKPAYTMTLAAKLAHVKGVKTLFF 

CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
i I I I t i MM I I I I i i I 1 I M t : I I I I I I I I I I I : I I I I t i I M I I I t I I I i t ) I 
CCERLPDGQGFVLHIRPVQ6ELNGNKAHDAAVFNRNTEYWIRRFPTQYLFM YNRYKTP 



In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 



35 gnl|PID|e334283 (Y14568) htrB [Pseudomonas fluorescens) Length = 253 

Score = 80.8 bits (196), Expect = 9e-15 

Identities = 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 



40 


Query; 


101 


MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 


159 




+ + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 






Sbjct: 


94 


LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI IFYRPPKLKAVD 


150 




Query: 


160 


KIMQAGRVRGKGKTAPTGIQGVKQIIKALRAGEATIILPDHVPSPQEGGGVWADFFGKPA 


219 


45 






++++ RV+ K A + +G+ +IK +R G I D P P E G++ FF A 




Sbjct: 


151 


ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD— PEPAESAGIFVPFFATQA 


208 




Qu.^ry: 


220 


YTMTLAAKLAHVKGVKTLFFCCERLPDGQGF 250 










T + +F RLPDG G+ 






Sbjct: 


209 


LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 





50 Based on this analysis, including the presence of a putative transmembrane domain in the 



gonococcal protein, it was predicted that the proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useftil antigens for vaccines or diagnostics, or for raising antibodies. 



ORF138-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli^ as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
55 shows the results of affinity purification of the GST-fiision protein. Purified GST-fiision protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FAGS analysis 
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(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 



Example 69 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 573>; 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 
51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 
101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 
151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 
201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 
251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 
301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 
351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 
401 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 
451 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 
501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 
551 CGCGGGCGAT GGTGCTG.. 

This corresponds to the anmio acid sequence <SEQ E) 574; ORF139>: 

1 ..AWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 
51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 
101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 
151 GEFAATLFLS RPEWQTLTTL lYAYLGRAGE DNYARAMVl. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 

1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCG/^GTGG AAATTTACCA 

651 GTTGGTC7VTG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGG GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVLSVCCLFP LLAIW KAffS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LM FLPFMVSP VCVSAGVLLL YPQWTAS LPL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAJ fVXTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.menin^tidis f strain A) 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A of A^. 
meningitidis: 



10 20 30 

orf 139 . pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 

1 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 1 : 1 1 1 1 1 unit 

orf 139a QSVGEYVLLAF AAAVXSVCCLFXLLAIW KAW SAGE SWRVLMESETWQAVWNTXRFS AAA 

270 280 290 300 310 320 



40 50 60 70 80 90 

or f 13 9 . pep VYAAAVLGWYAAP ARRSAWMRGLM FXPFMVSPVCVSAGVLLL YPQWTAS LPLLLAMYAL 

I I I I I I I i I I I 1 t llllilllllll lllltlllllltllll t I I I I t I t I I I 1 t I I t 
or f 13 9a VYAAAVLGWYAAA ARRSAWMRGLM FLPFMVSPVCVSAGVLLL XPQWTAS LPLLLAMYAL 

330 340 350 360 370 380 



100 110 120 130 140 150 

orf 139 .pep LAYPFVA KDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

I I t I I I I I I I I i I I I M i I I I I I I I I M I I i M 1 I I I I I I I I I I I I I I I i I I I M I i I t 
orf 13 9a LAYPFVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

390 400 410 420 430 440 



160 170 180 189 

orf 139. pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 

I I ) i I I I I It I I i I t II II t t I 1 I t I I t I II I I I I 
orf 139a GEFAATLFXSRXEWCyTLTTLIYAYXGRAGXDNYARAM VLTLLLAAFALGXFLLL DGGEGG 

450 460 470 480 490 500 

The complete length ORF139a nucleotide sequence <SEQ ID 577> is: 



1 ATGGATGGAC GGCGTTGGGC 

51 GGCTTTTTTG GCGGCAATGG 

101 ATGACGGTTT GGCGTGGCGC 

151 CGTTTGGCGT GGACGGTATT 

201 GCCTTTGGGC GTGCCTGTCG 

251 GGCGGGCTTT GGTGCTGCGC 

301 TTGGTGGCGG GCGTGGGCGT 

351 GTGGCGCGGC TGGCAGGATA 

401 TTTTTNACCT TCCTGTGTTG 

451 GTGCCTGCGG CACGGCTTCA 

501 GCGGCGGTTT TGGGACATTG 

551 GCGGCGTGTG CCTTGTCTTC 

601 TTGCTGCTGG GCGGCAGCCG 

651 GTTGGTCATG TTCGAACTCG 

701 TGGTGTN'GGG GGTAACNGCG 

751 AGGCGCGCGG TTTCGGATAA 

801 GCAGTCGGTC GGGGAATATG 

851 CTGTGTGCTG CCTGTTTCNT 

901 GCCGGCGAAT CGTGGCGTGT 

951 GTGGAATACT NTGCGCTTCT 

1001 TGGGTGTGGT GTATGCGGCG 

1051 CTGATGTTTT TGCCGTTTAT 

1101 GCTGCTGCTT NATCCGCAGT 

1151 TGTATGCGCT GCTGGCGTAT 

1201 TGNGATGCAC TGCCGCCGGA 

1251 AAACGGCTTT CAGACGGCAT 

1301 CGTTGCGGCG CGGTCTGACT 

1351 GCGGCAACCT TGTTCNTGTC 



GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 
TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 
GCGGTGCTGT CGGATGCCTA TATGCTCAAA 
TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 
CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 
CTGCTGATGC TGCCTTTTGT GATGCCCACG 
GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 
CGCCGTATCT GTTGTTGTAC GGCAATGTGT 
GTCAGGGCGG CATATCAGGG GTTTGTGCAA 
GACGGCACNG ACATTGGGCG CGGGGGCGTG 
AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 
CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 
TTATGCCACG GTCGAAGTGG AAATTTACCA 
ATATGGCGGT TGCTTCGGTG CTNGTGTGGC 
GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 
GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 
TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 
TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 
GTTAATGGAA AGTGAAACGT GGCAGGCGGT 
CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 
GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 
GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 
GGACGGCTTC GTTGCCGCTG CTGCTGGCGA 
CCGTTTGTGG CAAAAGATGT TTTATCAGCC 
TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 
GCCGCATCAC GTTCCCCCTC TTGAAACCGG 
TTGGCGGCGG CAACCTGCGT GGGCGAATTT 
GCGTCNCGAG TGGCAGACGC TGACGACTTT 
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1401 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 
1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 
1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 

1 MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FGA DGLXWRG WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 

151 VPAARLQTAX TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVXGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVXSVCCLFX LLAIW KAWS 

301 AGESVfRVLME SETWQAVWNT XRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LM FLPFMVSP VCVSAGVLLL XPQWTAS LPL LLAMYALLAY PFVAK DVLSA 

401 XDALPPDYGR AAAGLGANGF QTACRITF?L LKPALRRGLT LAAATCVGEF 

451 AATLFXSRXE WQTLTTLIYA YXGRAGXDNY ARA MVLTLLL AAFALGXFLL 

501 LDGGEGGKRT ETL* 

ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 

orf 139a . pep MDGRRWAVWGAFALLPSAFLAAMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I I M I I : i I I I I I t I I t I M I : H M I I M I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I 
orf 139-1 MDGRRWWWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orf 139a . pep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 
i I I I I I M i I I I I I I I I t I I i I I I I I I I I I I I I I I I I I i [ I I I I I I I i It I I I I I I IN 
orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orf 139a . pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 
I I I I I I I I t I I I I I M I I I I I I I I I I I I I I I 1 I I I I i I { ) M I I I M t i t i I I I i t [ 
or f 1 39-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

or f 1 3 9a . pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 

I I I t I I i I I I I I i I I I I I t I I I I I I I I I I I 1 i I I I M I I t M I I t I I I I t I I I I I lilt 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 139a . pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 

II I II II II II I 1 It II t II II II M t t I t I I I II I M II II I I I I II i I I I I It 1 I 1 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orf 139a , pep AGESWRVLMESETWQAVWNTXRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 
I I I I I I I I II I I I I I t I I I I I t I t II I I I II I I I I I I I I I I I I I I I I I I I I I I I M t t I 
orf 139-1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 

orf 139a . pep VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 
I I I I I I I 11 t I I I I I I I I t I I I I I II I I i I I I I I I I I I I I t I I I 1 I I I I I I I II t I M 
orf 13 9-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

or f 13 9a . pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 
I I I I I I I I II I I I I I I I I I t I I I I I t I I I I I I I I I II I I I I M I t I I I I I II I til 
orf 139-1 QTACRITFPLLKPALRRGLTIAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 
I I I II I I I I I I I I t I t I I I II I I I t 1 I : 1 I I t I 
orfl39-l ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 

Homology with a predicted ORf from N.gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
Kgonorrhoeae: 

orf 139. pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

I I I I I I I 1 I t I I I I I t : I I I t I I I I I I I I 
orf 139ng QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 



or f 1 3 9 . pep VYAAAVLGVVYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 90 

1:11111111111 III : t I t I I : I I I I I t 1 t I I I I I I I I 1 I t I I I I I I I II I I I I I 
orfl39ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 387 
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orf 1 39 . pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

I I I I I I I I I M I I I I I I t I I I I I M I I I I I I I I I I I I I I I t I I M I I I I I I I I i i t t I I I 

or f 13 9ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 447 

orf 139 . pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 189 

I I i 1 1 1 1 n 1 1 1 1 1 1 M i 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orfl39ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to encode a 
protein having amino acid sequence <SEQ ID 580>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQG FAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMP VLRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGWYAA AARRLVWMRG 

351 IiVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

401 WDALPPDYGR /\AAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 581>: 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

1401 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 

1451 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 

1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLP FVMPT 

101 LVAGVGVLAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRKF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

351 LV FLPFMVSP VCVSAGVLLL YPGWTASL PL LLAMYALLAY PFVAK DVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

4 51 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 
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ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

or f 13 9ng MDGRCWAVRGAFSLLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I I I I 1:1 I I I : M I I I I i I I I I I t I I t I I I I I I I I I I I I I I I I I M M I I I I I M I 1 I 
or f 13 9-1 MDGRRWWWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orfl39ng ATCVLVLPLGVPVAWVIJUIIAFPGRALVLRLIJ«LPFVMPTLVAGVGVIJVLFGADGLLWRG 
I M I i i I I M i I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I 1 I I I I I I I i M t I I 
or f 1 3 9- 1 ATCVLVLPUSVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRHWDIEMPVLRP 
I I I I I I I I I I M I I I I I [ I I I I I I I 1 [ I : I I I I I I I I I t I I M I I I t i I I M I I I t I [ I I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLC3TARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 
t I I I I I I I I t I I t I i i I I i M I I t I I I i I I I I I I I I I t I I M I I I I I I : I I I I I I I I M 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orfl39ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 
M I I I I I 1 I I I M I I I t I I I t I I I I I I i f I I I t I I I I t : : I I I t t I I I I i I i I I i I I I I 
orf 13 9-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orfl39ng AGESRRVLMESETWQAVWNTLRFSAAAVFAAAVLGVVYAAAARRLVWMRGLVFLPFMVSP 
nil 1 t I II I I I II I I I I I I II II I II: I t I I II II I I I 1 I It : I I I I I : II I I I II I 
or f 1 3 9 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPEMVS P 

orfl39ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
I II I I I I I I I M II I I I I I I I I I i I I I I I II I I II I I II I II I I I II II t I I II II I II 
orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 
I I I I I I I I I I I I t I I II I II I I I I II M II M I I I i I I I I I II I I I I I I I I I II I I I I I I 
orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orfl39ng ARAMVLTLLLSAFAVCIFLLLDNGEGGKRTETL 
1111111111:111: 1 i M I I : I I I I I: I I I I 
orfl39-l ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (imderlined) in the gonococcal protein, it is predicted that the proteins from 
Kmeningitidis and Kgonorrhoeae, and their epitopes, could be usefiil antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 70 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ED 584; ORF140>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 
51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 
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201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 
GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 
GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 
CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGTCT 
GTCAACCAAA 
GTTCGCCATC 



CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 
ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This corresponds to the amino acid sequence <SEQ ED 586; ORF140-1>; 



1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of M 
meningiddis: 

10 20 30 40 50 60 

or f 1 4 0 . pep MDGWTCyTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 
I i I M t I 1 I I I I I I M I I I I I I I I t M I I : I I I I I I I I I t t I I ) I I I I I I I I I I I t I I: I 
Orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVKD 

10 20 30 40 50 60 



70 80 
orfl40.pep ILVKNFGGT LGGVALLVGLGAMLERLV 
: t I i I i I I I I I I I I I t I I I I I I I Ml 
or f 140a VLVKNFGGTL GGVALLVGLGAMLGRLV ETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

70 80 90 100 110 120 

The complete length ORF140a nucleotide sequence <SEQ ID 587> is: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 
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501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAM 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGCCT 
GTCAACCAAA 
GTTCGCCATC 



ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This encodes a protein having amino acid sequence <SEQ ID 588>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 



GLPTGSIVND VLVKNFGGTL 
IRMFGEKRAP FALGVAS LIF 
FALASIGAFS VMHV FLPPHP 
SGYMLGKVLG RTIHVPVPEL 
IFLNTGVSAL ISEKLVSADE 
RGESGSALEK TVDGALAPVC 
DLG IPVLLGC FLVALALRIA 
CIVLATAAGS VGCSHFNDSG 
FALSALLFAI V* 



GGVALLVGLG AMLGRLVE TS GGAQSLADAL 
GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 
GPIAASEFYG ANIGQVLILG LPTAFITWYF 
LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 
TWVQTAKIIG S TPIALLISV LVALFVLG RK 
SVILITGAGG MFGGVL RASG IGKALADSMA 
QGSA TVALTT AAALMAPAVA AA GFTDWQLA 
FWLVGRLLDM DVPTTLKTWT VNQTLIALIG 



ORF140a and ORF140-1 show 99.8% identity over a 461 aa overlap: 



orf 140-1 , pep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

1 1 i I [ I M 1 1 1 1 1 1 1 1 1 1 1 1 n I I I I I I I I I I I I M I I t I I I I I I I It I I t I I I I I I I I [ 

orf 140a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 



orfl40-l.pep 



orfl40a 



orfl40-l.pep 



orfl40a 



orf 140-1 .pep 



orfl40a 



I LVKN FGGTLGGVALLVGLGAMLGRLVETSGGAQS LADALI RMFGEKRAPFALGVAS LI F 120 
: I t I I I I I M I I I t I I I I I I M t I I I i I I I I I I I I I I t t I I I M I I I I t I t M I 1 I t 1! i 
VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 

GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 
I M I M I I M I t I I I I M I I I I I I M I I I I t I I I I I I t M I I I I I I I I i I I t I It I I i i I 
GFPI FFDAGLIVMLPIVFATARRMKQDVLPFALAS IGAFSVMHVFLPPHPGPIAASEFYG 810 

ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 
I I I I I I I I I I t t I I I I t I I I I I I I I I I 1 I i I t I I I M I I i I I I I I I M t I I t I I I I I I I I 
ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 24 0 



orf 140-1 .pep VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 300 

I I I I t I t M t t I I I t I I I I I I I I I I I t I I M I I t I t I I I I I I I t I I I M t 1 i I I I I I I i I 
orf 140a VAIMLIPMLL IFLNTGVSAL I SEJCLVSADETWVQTAKII GST PI ALL I SVLVALFVLGRK 300 

orf 140-1. pep RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

I I I I I I I i M I I I I tl I I t 11 I M I I t I It I I I I It I I I I I I It t I I I t I I I I I I I I M I 
orf 140a RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 



orf 140-1 .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

II t I I I I 1 I I I I I I I t I 1 I I 1 II I I II I 1 I I I H t II I It I I I 11 I t II I I I I t I It I tl 
orf 140a FLVALALRIAQGSATVALTTAAALMAPAVAAAG FT DWQLACIVLATAAGS VGCSHFNDSG 420 

orf 140-1 .pep FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

I I I I I I t It I I I i I It It I M I I II I I II I I I I II II I I M 
or f 1 4 Oa FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 4 61 

Homology with a predicted ORF from N.2onorrhoeae 

ORF 140 shows 92% identity over a 87aa overlap with a predicted ORF (ORFMOng) from 



N. gonorrhoeae: 
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orfl40.pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 

lit IIIMMMIIIIIIIIilllllll:lll:llllill:ltiillMIIMIill:l 
orfl40ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

orf 14 0 . pep ILVKNFGGTLGGVALLVGLGAMLERLV 87 

: I I I I i t I t I I M I I I I I I I I I I III 
orfl40ng VLVKNFGGT1,GGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 



1 MDGRTQTLSA OTLLGISAAA IILILILIVK FRIRALLTLV lASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMEXSEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLGR K 

301 RGESGSTLEK TVDGALAP AC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DL6 IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA 7VA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 591>: 



1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG . CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 



1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV lASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLGR K 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLX.DM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 



orfl40ng-l.pep MDGRTQTLSAQTLLGISAAAIILIULIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
IM IIIMIIMtllltllllllllMltlll:lllllll:llllllllllllllttn 
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orfl40-l 



MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 



orfl40ng-l.pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
: I I I t t I I I I M i I I I I I i I I I I I I I I I I I I I I I I I It M I I i 1 I I I I I I I t I I I 1 I I I 
orf 140-1 ILVBCNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 



10 



15 



orf 140ng-l .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 
IMIII[llittlllllllllllll)lllllllil:lltlllllllttilMlllltlll 
orf 140-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

orf 140ng-l . pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 
liltllll[lllltlltllllllllitltl|:|illMMIIIIIII:| llllllllll 
orf 140-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 

orf 140ng-l .pep VAVMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKMIGSTPVALLISVLAALLVLGRK 
||:|||||l|[|lltilMII)lltlllltllltni:|lttl:|llllll:|l:|ll[l 
or5140-l VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 



20 



orf 140ng-l .pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSM7VDLGIPVLLGC 
il||||:|ltMllltll:llllllfllllllllllllltltllltllltllllllllli 
orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSblADLGIPVLLGC 



25 



orf 1 4 Ong- 1 . pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 
I I I I i I i I I I I It I I I M I I 1 I t I I t I [ I I I I I I I I t I I I I I I I I I I I I I M I I I I I I I I 
orf 140-1 FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 



orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 
i I It II I I i 1 M t II I t I II I I M I t I : I I I I II I II II I I 
orf 140-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

30 Furthermore, ORF140ng-l is homologous to an E.coli protein: 



35 



gi 1882633 (U29579) ORF_o454 [Escherichia coli] >gi 11789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 aa 
protein GNTP_BACLI SW: P46832 [Escherichia coli) Length == 454 
Score = 210 bits (529), Expect ^ le-53 

Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 



40 



45 



50 



55 



60 



Query : 


88 


Sbjct: 


80 


Query: 


148 


Sbjct: 


140 


Query: 


208 


Sbjct: 


199 


Query: 


258 


Sbjct: 


256 


Query: 


318 


Sbjct: 


313 


Query: 


378 


Sbjct: 


371 


Query: 


438 


Sbjct: 


431 



E SGGA+SLA+ 



L F L 



G+KR 



-t-A-f G P+FFD G I++ PI++ A+ K 



+HV +PPHPGP+AA+ 



A+IG + I+G+ + I 



GY 



E+L 



- SGGTQDSDPPKE PAKAGT WAVML I PMLLI FLNTG V 257 

G T+ SD P A V ++++IP+ +1 T 
SEGATECLSDKINPPGVA-LVTSLIVIPIAIIMAGT— 255 



+S L+ 



+ T ++IGS 



+RG S 



AL 



318 PACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGCFLVALALRIAQGSXXXX 377 

A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 
313 TAAWILVTGAGGVFGECVLVESGVGKALANMLQMIDLPLLPAAFIISLALRASQGS— AT 370 



+ LA G +G SH NDSGFW+V + L + V 



LK 



TWTV T++ F GF ++ ++A++ 



454 



Based on this analysis, including the identification of the presence of a putative leader sequence 



65 (double-underlined) and several putative transmembrane domains (single-imderlined) in the 
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gonococcal protein, it is predicted that the proteins from Kmeningitidis and ^gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 71 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 593>: 

1 ..GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 

51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 

101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 

151 AACTTTTTGG GCAGACACCA CGGGCGCAC. GTCGTCCTGA TTCTCATCGG 

201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 

251 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 

301 CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 

351 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 

401 TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 



1 . . DflGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

451 GCCGCCGCGT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATMCCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCG6ACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; 0RF141-1>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFAG FN FL6RHHGRS V VLILIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFFRPW QSRRLMLTAV ASLAFALPLM TVYPLLLAKT QPALFAQWLD 
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251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 
301 W GILGWWML AVLVLLAV NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 
401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
451 DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENI* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A> 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (0RF141a) from strain A ofK 
meningitidis: 

10 20 30 

or f 141 . pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 

I I I I I I i I t I I I I i M M t i I I t t t I: I 
orfl41a WNPDEPAVYTAVEALAGSPTPLVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAA 
40 50 60 70 80 90 

40 50 60 70 80 90 

R FAGVFFAVIGLTSCGFA GFNFLGRHHGR XWLILIGCIGLIPVAHF LNPAAAAFAAAGL 
I I I [ M i I I : I I t I I I I I I M I I M I I I I I I I I I [ t t I I t I I : : I I I I M M I I I I I I I 
RFAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFA7VAGL 
100 110 120 130 140 150 

100 110 120 130 140 

orf 141 .pep VLHGYSLARRR VIAASFLLGT6WTLMSL AA AYPAAFALMLPLPVLMFF RP 
I I I I I M I M I I M I I I I I I I I I I I t M I I I I I I I I I I I I I I t i I I I I I I 
orf 141a VLHGYSLARR RVIAASFLLGTGWTLMSL AA AYPAAFALMLPLPVLMFF RPWQSRR LMLTA 
160 170 180 190 200 210 

orf 14 la VASIAFALPIJ4TV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWF 
220 230 240 250 260 270 

The complete length ORF141 a nucleotide sequence <SEQ ID 597> is: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCGT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 

1451 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 



orf 141. pep 
orfl41a 
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1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAWG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPTVHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRR LMLTAV ASLAFALPLM TV YPLLLAKT QP7VLFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

551 ENILKTTD* 

ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 



or f 14 la . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 
I I I I i I I I t t I I I I I I I I I I t [ I I i I I I I I I I I I I I I t I I I I I I i I I I I I I I I i I I I I t I 
orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 



orf 141a . pep LVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAWGLTSCGFAGFN 
I I I I I I I i I I I I I I I I I M t i I i I I I i I M i I I I I I i I I I I M I I I : i I I I I I I I I I I 
orf 141-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orf 14 la. pep FLGRHHGRSWLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I I I I t I I I I I I I I 1 I I i I I I I I :: I I I ) I I I I I I I I I I I I t i I M M I M M I I I I I I I I 
orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNP7VAAAFAAAGLVLHGYSLARRRVIAASFLLGT 



orf 14 la . pep GWTIJyiSIAAAYPAAFAI^PLPVI^FFRPWQSRRIJ4LTAVASIAFALPLMTVYPLLLAKT 
t I I I I M I i t i M I I I I I I I t I I I I I I I I I M t I I I I I I I t I I t I t I I I i I I 1 I M I I I i 
orfl41-l GWTLMSIJVAAYPAAFALMLPLPVI^FFRPWQSRRI^LTAVASLAFALPLMTVYPLLLAKT 



orf 14 la. pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
I t I M I I I I I M I I I I 1 I I I I : I I I I I I It I I I I I I I I I i i I i I t I I I I I I I I I I I I i I 
orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 



or f 1 4 la . pep WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALPGAAQLDSLRRGAAAFVNWFGIMA 
i n I I I I I I I I t I I I I I I I I t I I I I t I I I I t I M I I I I I I t I I I I I I I I I I I I I I I I I I [ 
orf 14 1-1 WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 



orf 141a. pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPD I DPI PMAVAVL FT PLWLWAITRK 
I t I I I I I M I I I I t I M i I I I 1 I I I I I I I I I I I I I I I I M i I I I I t M I I I I I I I t I I I t 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 



orf 1 41a. pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 
I I I I I I I I I I I I t I I i I I I I I I I I M I I I I I M I I I I I [ I I I I I I I I I t I t I t I i I I t I I 
orf 14 1-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 



orf 141a. pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
II I II I I I I t II t M I t I II t I II i I II II II II I t II I I II I II I II I I M I It II I 
or fl 4 1 - 1 CIGIGGGDLHTRI VWTQYGTLPHRVGDVQCRYRI VLLPQNADAPQGWQTVWQGARPRNKD 



orf 141a . pep SKF7VLIRKTGENI 
I I I I I 1 II till 
orf 141-1 SKFALIRKIGENI 



Homology with a predicted ORF from N,2onorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (0RF14lng) from 
N,gonorrhoeae\ 



orf 14 1 . pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 30 

till I II 1) II II t II II I I I It 11:1 
orfl41ng WNPAEPAVYTAVEALAGSPTPLVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAAHPYDAA 126 
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orfl41.pep RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXWLILIGCIGLIPVMFLNPAAAAFAAAGL 90 

I I I I I I I I I I M I I I I I I I I I ! I I [ i I I I I I I I I I I I I I I I I t I I : I I I I t M I I I t I 
orfl41ng RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 186 

orf 141 .pep VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 140 

n 1 1 1 1 1 1 i M 1 1 i 1 1 1 i 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [ I It 1 1 

orf 141ng VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 

An ORF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 



1 MPSEAVSARP LCEYLLHLAI RPFLLTLMLr YTPPDARPPA KTHEKP WLLL 

51 LMAFAWLWPG VFS HDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

101 PPVYLWVAAA FKHLLSPWAA HPYDAA RFAG VFFAVIGLTS CGFA GFNFLG 

151 RHHGRS WLI HI6CIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

201 ASFLLGTGWT LMSLA A AYPA AFTVLMLPLPV LMFF RPWQSR R LMLTAVASL 

251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAF NPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 

451 GRQAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

451 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTT7VATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; 0RF141ng-l>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLIHXGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRR LMLTAV ASLAFALPLM TV YPLLIAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 WGILGIVWML AVLVLLAFNP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
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351 AFVNWFG IMA FGLFAVFLWT GFFAM NYGWP AKLAERAAYF SPYYVPDIDP 
401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
451 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENILKTTD* 

0RF141ng-l and 0RF141-1 show 97.5% identity in 553 aa overlap: 

orf 141ng-l .pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 
I I t I I I I I I I I M I I I I I I I I ) M I I I I I I I I I I t I M i I I I I f I I t t I I M t I i I I I t 
orfl41-l MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orf 141ng-l . pep LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVIGLTSCGFAGFN 
I I I i I I I I t I I I I I I M I I I I i I I I I I I I t I I i i I i I I I I I I I I I I I M t I I t I M I I I 
orf 141-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 



15 orf 141ng-l .pep FLGRHHGRSWLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

I I I I i I I I I t I i I I I I I M I I i I I I I I I M I I I M I I i M I i I I I I M I I I I t t I I I I I 
orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSL/VRRRVIAASFLLGT 



orf 141ng-l .pep GWTI^SIAAAYPAAFALMLPLPVIJ4FFRPWQSRRI^LTAVASlAFALPLtm^PLLIA^ 

20 I I I t i I t i t I I M I t I I I I I t i I 1 1 I I 1 1 I I I I I i I I I I I I I I M I I I I [ I I I I 1 1 I 1 1 I 

orf 14 1-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 



orf 141ng-l .pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
I I I I I I i I I: I I I I I M I t I I I: I t I I I I : i I I I I I I I I I : I t I I I t I I M I I I I I I I 
25 orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 



orf 141ng-l .pep WGILGIVWMLAVLVLLAFNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I M I : I I I I I M I I I I I I I I I M I M i I I I I I t M i I I I I I I i I I I I t I I I I I [ I I t I 
orf 14 1-1 WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

30 

orf 141ng-l .pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 
llltlllllilMllllttllllllltltMllilllllllllltlllllltlllilltl 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

35 orf 14 lng-1 . pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPVVRSMEASFSPELKRELSDGIE 

lllllllllittllllllltlllllllllllilinillllMMhIIIIIMIIIIIt 

orf 14 1-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 14 lng-1. pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
40 M t I I I I I I M I I I I I I I I I I I I I I I t I : I t I I I I I I I I i I i I i I I I I I I I I I I I I I I I 

orf 141-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orf 14 lng-1. pep SKFALIRKIGENILKTTDX 
I I 1 t I I I I I 1 I t I 
45 orf 141-1 SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from Kmeningitidis and Kgonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 



50 The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 603>: 



1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC ATTG/^AAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

55 This corresponds to the amino acid sequence <SEQ ED 604; ORF142>: 

1 . . QSAiCWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 

51 SGFQVGYTF* 



Further work revealed the complete nucleotide sequence <SEQ E) 605>: 
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1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 

1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHECEYIGRS TADFKLKYECR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSG05AKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results; 
Homology with a predicted ORF from N.2onorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
N. gonorrhoeae: 

orf 142 . pep QSAKWLSGQTLVGTAIGIRGQIKLGGNLHY 30 

I I i I I t I I I t I : I I i I t I I I t I 1 t I t I I I I 
orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orf 142 .pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

IIMIillillt:lt::l|::ltllll:| 
orfl42ng DIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATC7UUV CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCA/^CTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 



C-terminal end of outer membrane proteins. 



ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1. pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
M t I N I I I M I I I I I I I I I t I i: I I I I I I I I I I I I I t M I I I 1 t: M I I I I I 11 I I I I i 
orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 

orf 142-1 .pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
t I I I I I t t I I t I I I 1 t t I I t t i I I I 1 I I I I I I I I It I I [ I I I il t I I I I I t It I t I I I I : 
orfl42ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 

orf 142-1 .pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 
till I t t t t t I t I I I t II t t I I t : I I t t I I II t I t II II I t It It II : M I I I I I I 11 
orfl42ng-l VKLWTRETKS Y I DDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 

orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 
) I I I I I t I I M I II M I 1 I M I I I I II I I t II I I I I I I I I I M i I II I I I I i I M I I i I I 
orfl42ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 



orf 142-1 . pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAICWLSGQTLVGTAIG 
I I I I I I I I II I I II II M I I I I I I II I I I t I I I I II M I n I I I I I I I t I I II : I I I I I 
orfl42ng-l VRGFDGEMSL PAERGWYWRN DLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 



orf 142-1 .pep IRGOIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
lllllllllllltlltltlllllll:||::ll::llllll:l 
orfl42ng-l IRGQIKLGGNLHYDIETGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 



gi 1 1772622 (L39897) HecB [Erwinia chrysanthemi ] Length = 558 
Score = 119 bits (295), Expect = 3e-26 

Identities = 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 

Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS SRFATSHDT^SLQAG 280 

Query: 62 HYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLSV 121 

+S P+G W +N++ RY + G S F +R+++RD KT ++ 

Sbjct: 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 

R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 

+++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

Sbjct: 400 DKSADEPRAEFNKWTLSASYYHPV TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 456 

Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RGF-REQYTSGNRGAYWRNELNWQAWQLPVLGNVTFMAAVEXSGHLYNHKQDNSTAASLWG 515 

Query: 297 TAIGIRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 
A+G+ + L + G + P + Q V G++VG SF 



wo 99/24578 



-345- 



PCT/IB98/01665 



Sbjct: 516 GAVGMTVASRW LSQQVTVGWPISYPAWLQPDTMWGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins ftom Kmeningitidis and 
Kgonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 73 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCJIACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. . 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

1 MRTKWSAVRS CTWAOTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 
51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN . . 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADMEMVSS 

51 EKLLIWAITTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORF from Kmeningitidis (strain A) 

ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) bom strain A of//. 
meningitidis: 

10 20 30 

or -J 14 3 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFL 

I : : 111 I II II II I M I I I II I II M 
orfl43a GAFYAVSSDXPSAGFCTLLHSLLKADADEMVSSEKLLTWAXTADIDTALNLLYRLQKLEFL 
20 30 40 50 60 70 

40 50 60 70 80 90 

orfl43.pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
lllllltlltlll IMIIIIIIMIllltl IIMIIIIIIItllllllllMI 
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orfl43a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
80 90 100 110 120 130 

100 110 
orf 143.pep VAQMEKKYRLLIKNN 
I I I I I I I I I I MM 

or f 1 4 3a VAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG I PDLGKEA 

140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence <SEQ ED 613> is: 

1 ATGGAATCAA CANTTTCACT ACAAGCAAAT TTATATCNCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGNCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCGGATG CGGACGAAAT GGTNAGCAGT 

151 GAGAAGCTGC TTACCTGGGC GGANACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCNNATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAMGA GGCATTTGTT 

601 ACTTTGGTAA GGATNTTATA CCNCCNGTTA CAGCAACCGC GTGTAAAACT 

651 TGGGAGAGAG GANGGGTTAT GCAGCAATTA TTGA 

This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY* 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 

orf 143a . pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLPCADADEMVSSEKLLTWAXTA 
MM I M M M M M M M M [ M t M M M I M M M M M I M M M M M I M 
orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 

orf 143a . pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 
M M M I M M M M M M M M M M M I M M M M M M M M M M M M M M M 
orf 143-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 143a. pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
M M M M M M M M t M M M [ M M M M M M M i M M M M M M M M M M 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

or f 1 4 3 a . pep STKFILVIGGI PDLGKEAFVTLVRXLY 
I M M M I M M M M M M M M M 
orf 143-1 STKFILVIGGI PDLGKEAFVTLVRXLY 

Homology with a predicted ORF from N.eonorrhoeae 

ORF143 shows 95.5% identity over a llOaa overlap with a predicted ORF (ORF143ng) from 
Kgonorrhoeae: 

orf 14 3 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 

I I M M M M M M M M M M M M M M M M M I M M M M M M M M M M I 
orfl43ng MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

or f 14 3 . pep SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 1 10 

M M M M M M M I M M M M I : M M M M M M M M t M M M M 
orfl43ng SGSGECALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 

An ORF143ng nucleotide sequence <SEQ K) 615> was predicted to encode a protein having amino 



acid sequence <SEQ ID 616>: 
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1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVA^^^E 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT FFPLYIGSTK FILVIAGI PD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 

1 ATGGAATCM CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

401 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>; 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGK7VLL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LT FFPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 

orf 143ng-l .pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEWSSEKLLA-ADTA 59 

I t I I I I M I I t I I I I I I I I I I I I I t I It M It I i I : I I I t I t I I I : I I I I ) I I : lilt 
orf 14 3-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orf 143ng-l . pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

I I II II I I I II II I I II I M I I I II I I I I I II I I II I I t I I II I I II I I j] I I I II I i II 
orf 143-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 120 

orf 143ng-l , pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 179 

I I I I I I I : I I I II I I I I I I i I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 4 3- 1 NANFHHEAAEELGLLA/ffiVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELT FFPLYIG 180 

orfl43ng-l.pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

IIMIIII:|IIII:|IIIIIIIIIIIIIIIIII 
orfl43-l STKFI LVI GG I PDLGKEAFVTLVRI LYRR YSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N.gonorrhoeaey and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

401 CCGTGGATG.. 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTEXQRLQGL ADNKICAFAW FWRRFDEER VPQX/^SMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 621>: 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGMGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 

1 MTFLQRLQGL ADNKICAFA W F\AmRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V 6SVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS lYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.menin^itidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of iV: 
meningitidis: 



10 20 30 40 50 60 

orf 14 4 . pep MTFLQRLQGLADNKICAFAW FWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
t I I I I I I I I I I I I I M I I I I I I I I I I I t ) I I I I I I I I I I I I I I I I i I i I I I i I t I I I I I 
o r f 1 4 4 a MTFLQRLQGLADNKICAFA WFWRRFDEERVPQAAASMT FTT LLALVPVLTVMVAVAS I F 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 14 4 - pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQAN RLTAIGSVMLWTSLML IRTID 
I I I I I I I t ) I I I t I I I I I I I I I 1 It i I I I I I I I I t I t I i I I M I M i I I M I I I I I I i 
orf 14 4a PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANR LTAIGSVMLWTSXML IRTID 

70 80 90 100 110 120 



130 

orf 14 4 .pep NTFNRIWRVXXQRPWM 
I I I I I I I M mil 

orf 14 4a NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 
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130 140 150 160 170 180 

The complete length ORF144a nucleotide sequence <SEQ ID 623> is: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGNTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTNTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCNGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATN GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAN CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTNCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCANGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTCTGT CTGGAAACCG CGCGTTCCCT CTTTACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGNG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AG7\AGCGTTC CGCAGGGNCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CNAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGATGCCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACAGCAGCA ATCTTGA 

This encodes a protein having amino acid sequence <SEQ ID 624>: 

1 MTFLQRLQGL ADNKICAFA W EWRRFDEER VPQAAASriTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS lYGAF AAVPF 

251 FLLWLNLLWT LVLG GAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKIARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 

or i: 1 4 4 a . pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 
[ I 1 I I I t I I I I I i [ I I M I I t I I I I I I I i I I I I I I I I M I M I I I I t I I t I I I I t I I t t I 
orf 144-1 MTFLQRLQGIADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orfl44a.pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSXMLIRTID 
I I t I i i I I I I I I I i M I I I I I 1 I t I I I I I I t ! i I I I t t i I I I I I I I I I t t I I I [ I I t I I 
orf 14 4-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

orf 144a. pep NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 

i 1 1 1 M i 1 1 1 1 1 1 1 1 It 1 1 1 M 1 1 1 1 n M t i 1 1 1 1 i n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

orf 144-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

or f 14 4a . pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 
I I i I I I : I I I I I I I I t I I M i I t I t M I i I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
orf 14 4-1 RTAATLTE^TLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

orf 14 4a . pep I YGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 
I I t t I I I I I I t I ! I I I I M I I I t I I t I I M I I I I M I t f I I I I I t I I I I i I I I i I I I I I 
orf 144-1 lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 14 4a . pep DAAQKEGXALPVQEFEIRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 
i I I I I I I I I I I I I I I I I t I I I t I I I I M I I I I I I I M I I I I [ I I I I I i I I I t I I I t I I I 
orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

orf 14 4a. pep FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTIAEFDAQAKKQQQS 408 

I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i t M : I 
orf 144-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 406 
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Homology with a predicted ORF from Kmnorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
N.gonorrhoeae: 



orf 14 4 .pep 
orf 144ng 
orf 144 .pep 
orf 144ng 
orf 144 .pep 
orfl44ng 



MTFLQRLQGLADNKICAFAWFWRRFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 
Mill II tlllllMlllhllhltllll I I I I I I I I I I I t I t I I i II I I t I I I t 
MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 



60 



60 



120 



PVFDRWS DS FVS FVNQTI VPXGADMVFD YINAFREQANRLTAIGS VMLWTSLMLIRTI D 
I II I II I I I ( II III M I II lllltlltl:tll:IIIIMIIIIItltlllllllllll 
PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 120 

NTFNRIWRVXXQRPWM 136 
1:1111111 :.l I I II 

NAFNRIVfRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 180 



The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 



1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TTLLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW ^TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFCCYRS lYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 



1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF 144ng, having the amino acid sequence <SEQ ID 628; ORF144ng-l>: 



1 MTFLQRWQGL ADNKICAFA W FVIRRFSEER VPQTWVASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS lYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
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351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS* 

ORF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

orf 14 4ng-l . pep MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 
null [lilillllllllM:lll:llllltllllllllllMlllillMIIIIIII 
orf 1 44-1 MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orf 144ng-l .pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 
I I I I I I M I i I t M I I II 1 II I I II I I I I I : II I: I I II t I I I II I I I I I I II II t I t I I 
orf 144-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQTVNRLTAIGSVMLWTSLMLIRTID 

or f 14 4ng-l . pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 
I : I M I I I I I : II I I II II II I II I I I II I I II I I I I I I I II I M t :: i : I I I t I : II 
orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 144ng-l .pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 
: I I I I : I II t t I II I I I I I II I I I I I i I I I I I I I II I II I I II I I I I I M I I I t I I I 
orf 1 4 4-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

orf 14 4ng-l . pep lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

1 1 1 1 1 1 1 1 1 1 i I i 1 1 It 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I I I I I I I I I I I 

orf 14 4-1 lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orfl4 4ng-l.pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 
I I I I I I I :: I I I I II I I I I I I I I I I I I I I I I I I I : i I I II i I I I I I I t I I I i I t i I : I I 
orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

orfl44ng-l.pep FKLFVYRPLPVERDHVNQAVDAVMTPCL^LNMTLAEFDAQAKKQQQS 

I I I I I I i I I I I II I II I i I I I I t I I I II i I I I I I i I I I I I I I I i : I 
orf 14 4-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N.gonorrhoeae^ and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 75 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ED 629>: 



1 ..AGACACGCCC GCCGCATCCG CATCGACACC GCCATC7VACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 

1 . .RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 

51 TRRKWLDAHE RQHLRQSLLE TREHG* 



Further work revealed the complete nucleotide sequence <SEQ ID 631>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

401 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

451 CTCATGCGCG CCATGAACGT GCTCATCGGC GCGGCCATCG CCATCGCCGC 
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501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



CGCCAAACTG 
CCGACAACCT 
AGGCGCATGA 
AATCAACGCA 
GCGAAAGCCG 
CGTAAAATCG 
GCAATCTCCC 
TCACACTGCT 
AGACACGCCC 
AGCCCTCGCC 
GCACCAATAT 
ACCCGCCGCA 
CCTGCTTGAA 



CTGCCGCTGA 
GGCCGACTGC 
CCCGCGAACG 
CGCATGGTCA 
CATCAGCCCC 
TCAACACCAC 
AAACTCAACG 
CCAAACCGAC 
GCCGCATCCG 
GAACACCTCC 
GCGTCAGGAA 
AATGGCTGGA 
ACACGGGAAC 



AATCCACACT 
AGCAAAATGA 
CCTCGAGGAG 
AAAGCCGCAG 
GCCATGATGG 
CGAGCTGCTC 
GCAGCGAAAT 
CTGCAACAAA 
CATCGACACC 
ACTACCAATG 
ATTTCCGCCC 
TGCCCACGAA 
ACGGCTGA 



GATGTGGCGT 
TTGCCGAAAT 
AACATGGCGA 
CCATCTCGCC 
AAGCCATGCA 
CTGACCACCG 
CCGGCTGCTT 
CCGTCGCCCT 
GCCATCAACC 
GCAGGGCTTC 
TCGTCATCCT 
CGCCAACACC 



TTCATGCTTG 
CAGCAACGGC 
AAATGCGCCA 
GCCACATCGG 
GCACGCCCAC 
CCGCCAAGCT 
GACCGCCACT 
TATCAACGGC 
CCGAACTGGA 
CTCTGGCTCA 
GCTGCAACGC 
TGCGCCAAAG 



This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>; 



1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAATVKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRXDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A of A^. 
meningitidis: 



10 20 30 

orf 14 6 . pep RHARRIRIDTAINPELEALAEHLHYQWQGF 

I I I M I I I I I I I I I i I I I I I I I I I I I I I I I 
orf 14 6a KLNGSEIRLLDRHFTLLQTDLQQTVALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 
2B0 290 300 310 320 330 



40 50 60 70 

or f 1 4 6 . pep LWLSTDMRQEI SALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

I I I I I : I I t t I M t I I i I I I I I I I i I t i I t I t I I I M t I I I I I I : 
orf 14 6a LWLSTNMRQE I SALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 

340 350 360 370 

The complete length ORF146a nucleotide sequence <SEQ ID 633> is: 



1 


ATGAACACCT 


CGCAACGCAA 


51 


CGAACGCTAC 


CGCTACCGCC 


101 


CCGTCCTGTT 


CGCCACCGCC 


151 


GAGTGGATAG 


GGATGACCGT 


201 


AGGGGCGATT 


TACTCCAAGG 


251 


GGCTGGGCGC 


GGGTTTGGGC 


301 


GGCAACCTCC 


TCTTCTACCT 


351 


CTGGGCGGCG 


GTCGGCAAAA 


401 


CGATGTGCAT 


GCTCATCGGC 


451 


CTGATGCGCG 


CGATGAACGT 


501 


CGCCAAACTG 


CTGCCGCTGA 


551 


CCGACAACCT 


GACCGACTGC 


601 


AGGCGCATGA 


CCCGCGAACG 


651 


AATCAACGCA 


CGCATGGTCA 


701 


GCGAAAGCCG 


CATCAGCCCC 


751 


CGTAAAATTG 


TCAACACCAC 


801 


GCAATCTCCC 


AAACTCAACG 


851 


TCACACTGCT 


CCAAACCGAC 


901 


AGACACGCCC 


GCCGCATCCG 


951 


AGCCCTCGCC 


GAACACCTCC 


1001 


GCACCTATAT 


GCGTCAGGAA 


1051 


ACCCGCCGCA 


AATGGCTGGA 


1101 


CCTGCTTGAA 


ACACGGGAAC 



CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 
GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 
TCCGCCCGGC TGCTCCACCT CCAACACGGC 
CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 
CGGTGGAACG TATGCTCGGC ACGGTCATCG 
GTTTTATGGC TGAACCAGCA TTATTTCCAC 
CACCGTCGGC ACGGCAAGCG CACTGGCCGG 
ACGGCTACGT CCCTATGCTG GCGGGGCTGA 
GACAACGGCA GCGAATGGTT CGACAGCGGC 
GCTCATCGGC GCGGCCATCG CCATCGCCGC 
AATCCACACT GATGTGGCGT TTCATGCTTG 
AGCAAAATGA TTGCCGAAAT CAGCAACGGC 
CCTCGAAGAG AACATGGCGA AAATGCGCCA 
AAAGCCGCAG CCACCTCGCC GCCACATCGG 
GCCATGATGG AAGCCATGCA GCACGCCCAC 
CGAGCTGCTC CTGACCACCG CCGCCAAGCT 
GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
CTGCAACAAA CCGTCGCCCT TATCAACGGC 
CATCGACACC GCCATCAACC CCGAACTGGA 
ACTACCAATG GCAGGGCTTC CTCTGGCTCA 
ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
TGCCCACGAA CGCCAACACC TGCGCCAAAG 
ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 



1 MNTSQRNRLV 

51 EW IGMTVFW 

101 GNLLFYLTVG 

151 LMRAMN VLIG 

201 RRMTRERLEE 

251 RKIVNTTELL 

301 RHARRIRIDT 

351 TRRKWLDAHE 



SRWLNSYERY 
LGMLQFQGAI 



TASALAGWAA 
AAIAIAAAKL 



NMAKMRQINA 
LTTAAKLQSP 
AINPELEALA 
RQHLRQSLLE 



RYRRLIHAVR 
YSKAVER MLG 
VGKNGYVPML 
LPL KSTLMWR 
RMVKSRSHLA 
KLNGSEIRLL 
EHLHYQWQGF 
TREHS* 



LGGAVLFATA 
TVIGLGAGLG 



SARLLHLQHG 
VLWLNQHYFH 



AGLTMCMLIG 
FMLADNLTDC 
ATSGESRISP 
DRHFTLLQTD 
LWLSTNMRQE 



DNGSEWFDSG 
SKMIAEISNG 
AMMEAMQHAH 
LQQTVALING 
ISALVILLQR 



1 0 ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



or f 14 6a . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r I I I I It I M I I M M t I I I i 

orf 14 6-1 MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

orf 1 4 6a . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
t I I I I i I I I I I I I i I I I I I I I I I I [ I I I I I It M I I I I I I I I I M I t I I I I I I 1 I I I I I i 
orf 14 6-1 LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

orf 1 4 6a . pep VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I I I I I II II I I I I I I I I i I I t i I t t I : I t I I t I I It I I I II I I I I I I I I I II I I I ! It II 
orf 14 6-1 VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 14 6a . pep FMLADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
It I i I I I : t M I M I I t 1 I It t It I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 14 6-1 FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

orf 14 6a . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
I i I I I I I I I I I I I I II t I I I I I I I I I t I I I I I I I I II t I 1 II I II I I I I It i I I I I I I I i 
orf 146-1 AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

orf 146a . pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I I I II I I I I I I I t I I t I I I I I I I I I I I I I t t I II 1 I I I I I 1 I I t I I I I I I II I I I 1 
O r f 1 4 6- 1 RHARRIRI DTAIN PELEALAEHLHYQWQGFLWLSTNMRQEI SALVI LLQRTEIRKWLDAHE 

orf 146a . pep RQHLRQSLLETREHSX 

I I I I I I I 1 I I I I I I: 
orf 14 6-1 RQHLRQSLLETREHGX 

Homology with a predicted ORF from N.2onorrhoeae 

ORF146 shows 97,3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N, gonorrhoeae: 

orfl46.pep RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

I I I I I I I I I I II I I I I I I I I I I I M I I I 1 I 
orfl46ng KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 

orf 1 4 6 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 75 

I I I I I : I I I I I I I II I I It I I I I I I 1 t I I I I I I I 1 I I I I I I I I I 
or f 1 4 6ng LWLSTNMRQEISALVI PLQRTRRKWLDAHERQHLRQSLLETREHG 409 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
50 acid sequence <SEQ ID 636>: 



15 



20 



25 



30 



35 



40 



45 



55 



1 


MSGVRFPSPA 


51 


YERYRHRRLI 


101 


QGAIYSNAVE 


151 


GWAAVGBCNGY 


201 


AAKLLPLKST 


251 


QINARMVKSR 


301 


LQSPKLNGSE 


351 


EALAEHLHYQ 


401 


SLLETREHG* 



PIPSTDPPSG 
HAVRLGGTVL 
RMLGTVIGLG 



SLCFFTFPLQ 
FATALARLLH 
AGLGVLWLNQ 



VPMLAGLTMC 
LMWRFMLADN 
SHLAATSGES 
IRLLDRHFTL 
WQGFLWLSTN 



MLIGDNGSEW 
LADCSKMIAE 
RISPSMMEAM 
LQTDLQQTAA 
MRQEISALVI 



TASDMWSSQR 
LQHGEWIGMT 



KRLSGRWLNS 
VFWLGMLQF 



HYFHGNLLFY 
LDSGLMRAMN 
ISNGRRMTRE 
QHAHRKIVNT 
LINGRHARRI 
PLQRTRRKWL 



LTIGTASALA 
VLIGAAIAIA 
RLEQNMVKMR 
TELLLTTAAK 
RIDTAINPEL 
DAHERQHLRQ 



60 Further work revealed the following gonococcal DNA sequence <SEQ ID 637>: 
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1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT GCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCA/VAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTC/^CG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 



1 MNSSQEIKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSNAVE RMLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAM NVLIG AAIAIAAAKL LPU CSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORFl46ng-l and ORF146-1 show 96.5% identity in 375 aa overly 



orf 14 6-1 . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
ll:MI:|l : I I I I I I I I I I: I I M I I M M I : [ t I I i [ I I M I M I t I I I i t M I I I 
orfl4 6ng-l MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFW 



orf 14 6-1 . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I It I I I I t I I I I : I I I I I I I t [ I I I I I I I I I I i M I I I I I I I I I I M I : i I I I I I I I I I I 
orf 1 4 6ng-l LGMLQFQGAI YSNAVERMLGT VI GLGAGLGVLWLNQHY FHGNLL FY LT I GT AS ALAGWAA 



orf 14 6-1 . pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
IIIIIMIIIMIIIIIIIIIIItMlitltlllllllltlilllllllinilllMII 
orfl4 6ng-l VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 



orf 14 6-1. pep FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I I I I I I ( t I i M It t t I I I I I i t I I M I I : I I : i t i i t I I I M I I M I I I I I I I I i I i I I 
orfl4 6ng-l FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 



orf 14 6-1 . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
: I t I I I t I t II I I II M II I I I I I I I I I I II I t I t II I II II II I II I I I I I II : I I II I 
orfl46ng-l SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 14 6-1. pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I II I I I [ I I II I I i I I It M I I I I I i I I I I II I I I I I I t II I M I I I I I I I t t I I I I 
orfl4 6ng-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 



orf 14 6- 1 . pep RQHLRQSLLETREHGX 
I I I M I I I I I I I I I I I 
orf 14 6ng-l RQHLRQSLLETREHGX 

Furthermore, ORF146ng-l shows homology with a hypothetical E,coli protein: 

sp|P33011|YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi 11736674 |gnl|PID|dl016553 (D90838) ORF_ID:o348#20; similar to [SwissProt 
Accession Number P33011] [Escherichia coli] >gi| 1736682 |gnl [ PID|dl016560 (D90839) 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 
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>gi 1 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional C-tenninal residues (Escherichia coli) Length = 352 
Score = 109 bits (271), Expect = 2e-23 

Identities = 89/347 (25%), Positives = 150/347 (42%), Gaps = 21/347 (6%) 

YRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFWLGMLQFQGAIY SNAVERML 7 9 
YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
YRHYRIVHGTRVALAFLLTFLIIRLFTIPESTWPLVTMWIMGPISFWGNWPR7VFERIG 74 

GTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAAVGKNGYVPMLAGLTMCMLI 139 
GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 

GTVLGSILGLIALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIW 131 

GDNGSEWLDSGLMRAMOTLIGXXXXXXXXKLLPLKSTLMWRFMLADNLADCSKMIAEISN 199 
G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ + 



Query: 


20 


Sbjct: 


15 


Query: 


80 


Sbjct: 


75 


Query: 


140 


Sbjct: 


132 


Query: 


200 


Sbjct: 


191 


Query : 


260 


Sbjct: 


248 


Query: 


317 


Sbjct: 


306 



+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

PNLLERPRLESHLQKLL TDAVKMRGLIAPASKETRIPKSIYEGIQTINRNLVCMLEL 247 

XXXXXXXXQSPK LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

QINAYWATRPSH FVLLNAQKLR — DTQHMMQQI LLSLVHALYEGN PQPVFANTEKLNDAV 305 

EALAEHL — HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins fi:om Kmeningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 76 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 639> 



1 ..GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 

51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 

101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC.GCGGTGA 

251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 

301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

401 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 

451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 

501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 

551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 

601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 

651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 

701 CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 



1 ..AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 641>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ED 642; ORF147-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAfTOr 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGEVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein ORF286 of E.coli (accession number Ul 8997) 
ORF147 and E.coli ORF286 protein show 36% aa identity m 237aa overlap: 

Orfl47: 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
Orf286: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

Orfl47: 61 AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

0rf286: 103 YHLVRTCREAGIRWPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 162 

Orfl47: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSADGD 179 

++ +E+ HR+ +L D+ + E R ++IARE+TKT+ET VGE+ + D + 

Orf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 

Orfl47: 180 QSRGEMVLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

Orf286: 223 RRKGEMVLIV-EGHKAQEEDLPADAIi^TlALLQAELPLKKAAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N.menin^tidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A ofN. meningitidis: 

10 20 30 

AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
I I I I I I I I I M I I M I I I I I I I I I I I I I I I 
TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
20 30 40 50 60 70 

40 50 60 70 80 90 

MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGF KWPWGAXAVMAALSVA 

I i t I i M I I I I i I.I I I M t I I I I 1 I I I i I I I I I I I t t t I : I I I I I I I t I I I I II I I II I 
MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGF KWPWGASAVMAALSVA 

80 90 100 110 120 130 

100 110 120 130 140 150 

GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 

II I II I I I I II I I II M M I I I I I I I I I : I II : I I I i I I I I I I i : I I I tl i I I t I I I 11 
GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 

140 150 160 170 180 190 

160 170 180 190 200 210 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 



orfl47.pep 
orf 75a 

orf 147 .pep 
orf75a 

orf 147 .pep 
orf75a 

orf 147 .pep 
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I I I I M I I I i M I I M I I I I I t I I : I i I: I It I I I I I I I I I I I I I I I M I I I I I I I I I I I 
orf75a LAREITBCTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
200 210 220 230 240 250 

220 230 
orf 14 7 . pep LTAELPTKQAAELAAKITGEGKKALYD 
[ I I I I I t t I I I I I i I t I I I I I I I I I I I 
orf 75a LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
260 270 280 290 

ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF jfrom N.sonorrhoeae 

ORF147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from K 
gonorrhoeae: 

or f 14 7 . pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 30 

IIIIIMIIIIIilllllrilltltlllll 
orfl47ng TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 85 

orf 147 .pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGAXAVMAALSVA 90 

I I I I :: I : I I I I : I I t I M I I I I i I M I t I I I I I I I I I I I I I I i I t I I I I i I I I I M I I 

or f 1 4 7ng MADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGASAVMAALSVA 145 

orf 147 . pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 

II I I I t I I i M I M I I I t II t t II I t I I I I t: I I I I I I I I I n : I I M M I 1 I i I II I 
orfl47ng GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

orf 147 .pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

I I t I II I I II I I I II II I I I I I 11 : I n : I I I I II I I II I II I M I I I II II M I I 111 
orfl47ng LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 265 

orf 147 . pep LTAELPTKQAAELAAKITGEGKKALYD 237 

I : II II I I t I I M 1 I II I I I I M I I I t 
orfl47ng LAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 

An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 
acid sequence <SEQ ID 644>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGF KW PWGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

sp|P455281 YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi| 606086 (U18997) 0RF_f286 (Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities - 128/284 (45%), Positives = 171/284 (60%), Gaps - 4/284 (1%) 

Query: 4 KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQSADNSQ—GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

Query: 64 GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 179 

Query: 184 ADMAELFPERR-LMLAREITiCTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N-f +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTG(^CGt AJCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk.AA tATCCC.GAT 

401 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

451 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA. 

651 GTTCATATCA TATTGCAAGT 

701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 

2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 

3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



ATGAACACAA 
TTTAATGTTT 
AGGTGGTGTC 
CCTTTATTGA 
CAAGGTGCTG 
AAATAACGAA 
CCGTTACTTG 
GGCAAAGGCA 



TTCTCTGCCT 
CTTTATCCGA 
AACAGTTATC 
CGAAGGAAAA 
GAGGATTATA 
ACTTGGCAAG 
GAAAGTAAAC 
CGCTG 



AATAGATTAA 
GACAGCAAGA 
GACCCAGACT 
GGCGAATTGA 
TTTCCAAGGA 
GCGCGGGCGT 
GGCGTGGCAA 



AAACACGAAC 
GAACCTGTTT 
GAATAATGGA 
TACTTACCAG 
GATTTTACGG 
TCATATCAGT 
ACGACCGCCT 



CGTTCAATTG 
ATCATGCTGC 
GAAAATATTT 
CAACATCAAT 
TCTCGCCTGA 
GAAGACAGTA 
GTCCAAAATC 



// 



TGACTGCTTC 
GATCACGCTC 
TAGTGCAAAT 
ACGGCAACCk 
ACATTAAACG 
CGACCACGCC 
CAAACGTAAG 
GCAGTATTCC 
CAagGATACG 
GarCGGAATT 
TCCGCCTATC 
TGCGCCGCGC 
CACCGCCAAC 
AAATTGAACG 
CCGCAGCGAC 
TGGCGGTCAA 
GTAGTGGAAG 
CCTGCAAAAC 



ATTGACTAAG 
ATTTAAATCT 
GGCGATACAC 
TAgCCtCGtG 
GCAACACATC 
GTACAAAACG 
CCATTCCGCA 
ATTTTGAAAG 
GCATTACACT 
AGGCAATTTA 
GCCACGATGC 
CGCCGTTCGC 
TTCGGTAGAA 
GTCAGGGAAC 
AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
GAACACGTCG 



CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGgCGCG 
GAGsmAAAwT 
CGCGCCGgtt 
CtATTTCGTC 
CCCCCGGCCT 
TCATTCAAAC 
CTATACCGAT 
TATTGGCTCA 
GCCGAAATCA 
CCCGCAACTG 
GGTAA. . . 



GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GGCGCGGGTT 
CCGCCGCCGC 
tCggCGgATt 
CAAAAAGCGG 
TGCATTCAAC 
CGGCGCAACA 
GCCGCTTCGG 
GGATTTCGGC 
AAGGTTTCAC 
GAAGCGCAAC 



ACCGACATCA 
CACAGGGCTT 
GTTATACAGT 
G . sAATGcCC 
GGCTTCgGGC 
GCAGTCTGAC 
CTCAACGGTA 
CAGCCGCTTT 
TAAAAGACAG 
AACCTTGACA 
GGCAGGGGCG 
GCCGTTCGCG 
TCCCGTTTCA 
ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCTG 
CAAACCGCTG 
ATGCAGGCGC 
// 

TTAGAC 

GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TTAGCAGCGG 
GTGCtGCATT 
CGGCATCGAA 
ATTACCGCTA 
CGcTACCGCG 
CATTTCCATC 
GCAAAGTCCG 
AAAACCCGCA 
GCTGTCCCTC 
ACAGCGCGGG 



GCGGCAATGT 
GCCACACTCA 
CAGCCACAAC 
AAGCAACATT 
AATGCTTCAT 
GCTTTCCGGC 
ATGTCTCCCT 
ACCGGACAAA 
CGAATGGACG 
ACGCCACCAT 
CAAACCGGCA 
CCGTTCCCTA 
ACACGCTGAC 
ATGTCGGAAC 
TTCCGAAGGC 
CAAGCCTCGA 
TCCGAAAACC 
GTGG 



. . .GATAAAG 
CGATCTTGCC 
ACGGCAATCT 
GCCACCCAAA 
TAATCAAGCC 
TTAATCTAAG 
AACGCTAAGG 
AGCCGAT7\AG 
TCAGCGGCGG 
CTGCCGTCAg 
TACaCTCAAT 
GTGCGACAGA 
TTATmCGTTA 
GGTAAACGGC 
TCTTCGGCTA 
ACTTACACCT 
ACAATTGACG 
TTAATTTCAC 



CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
CAGCCTTTcA 
ACGGCATTCA 
CCGCACATCG 
CGAAAACGTC 
CGGGCATTAa 
ACGCCTTATT 
AACACGCGTC 
GTGCGGAATG 
CACGCTGCCG 
CATCAAATTA 



CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
GACGGCATCG 
GGCACGAtAC 
GCGCAACGCg 
AATATCGCCA 
GGCAGATTAT 
TGAGCCTGTC 
AATACCGCCG 
GGgCGTAAAC 
CCGCCAAAGG 
GGCTACCGCT 



corresponds to the amino acid sequence <SEQ ID 648; ORFl>: 



1 MKTTDKRTTE THRKAPKTGR IRFXAAYLAI CLSFGILPQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD lEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGVQYI VSVAHNGGYN NVDFGAEGXN IXDQXRXTYK IVKRNNYKAG 

151 TKGHPYGGDY HMPRLHKXVT DAEPVEMTSY MDGRKYIDQN NYPDRVRIGA 

201 GRQYWRSDED EPNNRESSYH IAS GS PMFIYDAQKQ 

251 KWLINGVLQT GNPYIGKSNG FQLVRKDWFY DEIFAGDTHS VFYEPRQNGK 

301 YSFNDDNNGT GKINAKHEHN SLPNRLKTRT VQLFNVSLSE TAREPVYHAA 

351 GGVNSYRPRL NNGENISFID EGKGELILTS NINQGAGGLY FQGDFTVSPE 

401 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTL 

// 

701 DKVTAS LTKTDISGNV DLADHAHLNL TGLATLNGNL 

751 SANGDTRYTV SHNATQNGNX SLVXNAQATF NQATLNGNTS ASGNASFNLS 

801 DHAVQNGSLT LSGNAKANVS HSALNGNVSL ADKAVFHFES SRFTGQISGG 

851 KDTALHLKDS EWTLPSGXEL GNLNLDNATI TLNSAYRHDA AGAQTGSATD 

901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 

951 RSDKLKLAES SEGTYTLAVN NTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

1151 LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 

1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 

1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 

1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 
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14 01 LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 
1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 



ATGAAAACAA 
AACCGGCCGC 
TCGGCATTCT 
TACCAATACT 
GGCGAAAGAT 
CAATGACAAA 
GTGGCGGCAT 
CGGCTATAAC 
ATCGTTTTAC 
AAAGGCCATC 
TGTCACAGAT 
AATATATCGA 
AGGCAATATT 
ATATCATATT 
CACAAAATGG 
AAACATAGCC 
TGGCTCACCA 
ATGGGGTATT 
CAGCTGGTTC 
CCATTCAGTA 
ACGATAATAA 
CTGCCTAATA 
ATCCGAGACA 
GTTATCGACC 
GGAAAAGGCG 
ATTATATTTC 
GGCAAGGCGC 
GTAAACGGCG 
GCACGTTCAA 
GTACAGTCAT 
TTTAGTGAAA 
CGATAATCAG 
GTTTGGATTT 
GATGAAGGGG 
TACCATTACA 
TGGATAGCAA 
ACGACCAAAA 
AGACCGCACC 
CGCAAACAAA 
TACAATCATT 
GGAAATCGTG 
ACTTCCAAAT 
GTGAAAGGCG 
CGCACCGCAT 
TGACAAATTG 
TTGACTAAGA 
TTTAAATCTC 
GCGATACACG 
AGCCTCGTGG 
CAACACATCG 
TACAAAACGG 
CATTCCGCAC 
TTTTGAAAGC 
CATTACACTT 
GGCAATTTAA 
CCACGATGCG 
GCCGTTCGCG 
TCGGTAGAAT 
TCAGGGAACA 
AATTGAAGCT 
AATACCGGCA 
AAAAGACAAC 
AACACGTCGA 
GAGTTCCGCC 
CGGCAAGGCA 
TTGACGCGCT 
GTTGCCGAAC 



CCGACAAACG 
ATCCGCTTCT 
TCCCCAAGCC 
ATCGCGACTT 
ATTGAGGTTT 
AGCCCCGATG 
TGGTGGGCGA 
AACGTTGATT 
TTATAAAATT 
CTTATGGCGG 
GCAGAACCTG 
TCAAAATAAT 
GGCGATCTGA 
GCAAGTGCGT 
ATCAGGTGGT 
CATATGGTTT 
ATGTTTATCT 
GCAAACGGGC 
GTAAAGATTG 
TTCTACGAAC 
TGGCACAGGA 
GATTAAAAAC 
GCAAGAGAAC 
CAGACTGAAT 
ZVATTGATACT 
CAAGGAGATT 
GGGCGTTCAT 
TGGCAAACGA 
GCCAAAGGGG 
TTTGGATCAG 
TCGGCTTGGT 
TTCAACCCCG 
AAACGGGCAT 
CGATGATTGT 
GGCAATAAAG 
AAAAGAAATT 
CGAACGGGCG 
CTGCTGCTTT 
CGGCAAACTG 
TAAACGACCA 
TGGGACAACG 
TAAAGGCGGA 
ATTGGCATTT 
CAAAGCCACA 
TGTCGAAAAA 
CCGACATCAG 
ACAGGGCTTG 
TTATACAGTC 
GCAATGCCCA 
GCTTCGGGCA 
CAGTCTGACG 
TCAACGGTAA 
AGCCGCTTTA 
AAAAGACAGC 
ACCTTGACAA 
GCAGGGGCGC 
CCGTTCGCGC 
CCCGTTTCAA 
TTCCGCTTTA 
GGCGGAAAGT 
ACGAACCTGC 
AAACCGCTGT 
TGCCGGCGCG 
TGCATAATCC 
GAAGCCAAAA 
GATTGCGGCC 
CGGCCCGGCA 



GACAACCGAA 
CGCCTGCTTA 
TGGGCGGGAC 
TGCCGAAAAT 
ACAACAAAAA 
ATTGATTTTT 
TCAATATATT 
TTGGTGCGGA 
GTGAAACGGA 
CGATTATCAT 
TTGAAATGAC 
TACCCTGACC 
TGAAGATGAG 
ATTCTTGGCT 
GGCACAGTCA 
TTTACCAACA 
ATGATGCCCA 
AACCCCTATA 
GTTCTATGAT 
CACGTCAAAA 
AAAATCAATG 
ACGAACCGTT 
CTGTTTATCA 
AATGGAGAAA 
TACCAGCAAC 
TTACGGTCTC 
ATCAGTGAAG 
CCGCCTGTCC 
AAAACCAAGG 
CAGGCAGACG 
CAGCGGCAGG 
ACAAACTCTA 
TCGCTTTCGT 
CAACCACAAT 
ATATTGCTAC 
GCCTACAACG 
GCTCAACCTT 
CCGGCGGAAC 
TTTTTCAGCG 
TTGGTCGCAA 
ACTGGATCAA 
CAGGCGGTGG 
GAGCAATCAC 
CAATCTGTAC 
ACCATTACCG 
CGGCAATGTC 
CCACACTCAA 
AGCCACAACG 
AGCAACATTT 
ATGCTTCATT 
CTTTCCGGCA 
TGTCTCCCTA 
CCGGACAAAT 
GAATGGACGC 
CGCCACCATT 
AAACCGGCAG 
CGTTCCCTAT 
CACGCTGACG 
TGTCGGAACT 
TCCGAAGGCA 
AAGCCTCGAA 
CCGAAAACCT 
TGGCGTTACC 
GGTCAAAGAA 
AACAGGCGGA 
GGGCGCGATG 
GGCAGGCGGG 



ACACACCGCA 
CTTAGCCATA 
ACACTTATTT 
AAAGGCAAGT 
AGGGGAGTTG 
CTGTGGTGTC 
GTGAGCGTGG 
AGGAAGAAAT 
ATAATTATAA 
ATGCCGCGTT 
CAGTTATATG 
GTGTTCGTAT 
CCCAATAACC 
CGTTGGTGGC 
ACTTAGGTAG 
GGAGGCTCAT 
AAAGCAAAAG 
TAGGAAAAAG 
GAAATCTTTG 
TGGGAAATAC 
CCAAACATGA 
CAATTGTTTA 
TGCTGCAGGT 
ATATTTCCTT 
ATCAATCAAG 
GCCTGAAAAT 
ACAGTACCGT 
AAAATCGGCA 
CTCGATCAGC 
ATAAAGGCAA 
GGTACGGTGC 
TTTCGGCTTT 
TCCACCGTAT 
CAAGACAAAG 
AACCGGCAAT 
GTTGGTTTGG 
GTTTACCAGC 
AAATTTAAAC 
GCAGACCAAC 
AAAGAGGGCA 
CCGCACATTT 
TTTCCCGCAA 
GCCCAAGCAG 
ACGTTCGGAC 
ACGATAAAGT 
GATCTTGCCG 
CGGCAATCTT 
CCACCCAAAA 
AATCAAGCCA 
TAATCTAAGC 
ACGCTAAGGC 
GCCGATAAGG 
CAGCGGCGGC 
TGCCGTCAGG 
ACACTCAATT 
TGCGACAGAT 
TATCCGTTAC 
GTAAACGGCA 
CTTCGGCTAC 
CTTACACCTT 
CAATTGACGG 
TAATTTCACC 
AACTCATCCG 
CAAGAGCTTT 
AAAAGACAAC 
CCGTCGAAAA 
GAAAATGTCG 



AAGCCCCGAA 
TGCCTGTCGT 
CGGCATCAAC 
TTGCAGTCGG 
GTCGGCAAAT 
GCGTAACGGC 
CACATAACGG 
CCCGATCAAC 
AGCAGGGACT 
TGCATAAATT 
GATGGGCGGA 
TGGGGCAGGC 
GCGAAAGTTC 
AATACCTTTG 
TGAAAAAATT 
TTGGCGACAG 
TGGTTAATTA 
CAATGGCTTC 
CTGGAGATAC 
TCTTTTAACG 
ACACAATTCT 
ATGTTTCTTT 
GGTGTCAACA 
TATTGACGAA 
GTGCTGGAGG 
AACGAAACTT 
TACTTGGAAA 
AAGGCACGCT 
GTGGGCGACG 
AAAACAAGCC 
AACTGAATGC 
CGCGGCGGAC 
TCAAAATACC 
AATCCACCGT 
AACAACAGCT 
CGAGAAAGAT 
CCGCCGCAGA 
GGCAACATCA 
ACCGCACGCC 
TTCCTCGCGG 
AAAGCGGAAA 
TGTTGCCAAA 
TTTTTGGTGT 
TGGACGGGTC 
GATTGCTTCA 
ATCACGCTCA 
AGTGCAAATG 
CGGCAACCTT 
CATTAAACGG 
GACCACGCCG 
AAACGTAAGC 
CAGTATTCCA 
AAGGATACGG 
CACGGAATTA 
CCGCCTATCG 
GCGCCGCGCC 
ACCGCCAACT 
AATTGAACGG 
CGCAGCGACA 
GGCGGTCAAC 
TAGTGGAAGG 
CTGCAAAACG 
CAAAGACGGC 
CCGACAAACT 
GCGCAAAGCC 
GACAGAAAGC 
GCATTATGCA 
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3351 GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

3401 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

3451 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

3551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 

3951 CGGCATTCAG GCACGATACC GCGCGGGTTT CGGCGGATTC GGCATCGAAC 

4001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 

4051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 

4101 GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 

4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 

4201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 

4251 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 

4301 ACGCTGCCGC CGCCATVAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 

4351 ATCAAATTAG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD lEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 

351 LPNRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDE 

401 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

451 VNGVANDRLS KIGKGTLHVQ AKGENQGSIS VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

751 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 

851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 

951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN 

1001 NTGNEPASLE QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGBCA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 

1101 VAEPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 

1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYANS GLSEFSATLN SVFAVQDELD 

1201 RVFAEDRRNA VWTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 

1251 GILFSHNRTE NTFDDGIGNS ARLAHGAVFG QYGIDRFYIG ISAGAGFSSG 

1301 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 

1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

1401 TRVNTAVLAQ DFGECTRSAEW GVNAEIKGFT LSLHAAAAKG PQLEAQHSAG 

1451 IKLGYRW* 

Computer analysis of these sequences gave the following results: 
Homology with a predicted ORF from Kmeninsitidis fstrain A) 

ORFl shows 57.8% identity over a 1456aa overlap with an ORF (ORFla) from strain A ofK 
meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTTDKRTTETHRKAPKTGR IRFXAAYIAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 
I I I I M M I I I I I I I I I I I I I t I I I t t I I M I M I i t I I I I 1 I It I I I I I I I I I I I I t 

orf la MKTTDKRTTETHRKAPKTGR IRFSPAYLAICLSFGIL PQAWAGHTYFGIWYQYYRDFAEN 

10 20 30 40 50 60 



orf 1 . pep 



70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
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orf la 



I I I I M I I I I I I I I I I I I t I I I I I I I I I [ I M It I I I I I I I I I I t I I I I I M i I I I I I I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
70 80 90 100 110 120 



10 



15 



130 140 150 160 170 180 

orf 1 . pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

I I I I I I I I I I t I I : I : I I I 1 I I I I : : I I I : i t I I I t I I I I I I I I I I I I f I 
orf la NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 

190 200 210 

orf 1 . pep MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEP NN 

I I I l:::l|:lllll:l::lll II 
orf la MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 



20 



220 230 240 250 260 

orf 1 . pep RESSYH lA SGS PMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

1::: : II tllllllll :: II f : II I I I I I I I I : I I I II : I I 

orf la SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 



25 



270 280 290 300 310 320 

or f 1 . pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

I t I I I : I : M I I : t : I I I : I I : : I I : : : I | | I I : : : I : | | : I I : : | | : I I : 
orf la DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 



30 



35 



40 



330 340 350 360 370 380 

orf 1 . pep SLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFT 

11:11 : t II I I I I II t : I I I I I I I I I I : I I I I I : I : Ml :: I I I II II II I I : II I I 
orf la SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 

360 370 380 390 400 410 

390 400 410 420 430 

orf 1 . pep VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

I I I I I I I I I I I I t I I I I I I I I I II I I I I II I I t I t I I I I t I I 
orf la VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
420 430 440 450 460 470 



45 



orf 1 .pep 
orf la 



VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
480 490 500 510 520 530 



orf 1. pep 



50 orf la RIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFGEKDTTK 

540 550 560 570 580 590 



orf 1. pep 

55 



orf la TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 
600 610 620 630 640 650 



60 orf 1. pep 

orf la IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 

65 440 450 460 470 480 

orf 1 . pep XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: 11:11111111111111 I : I I : I I I I t i I 

orf la TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
720 730 740 750 760 770 



70 



490 500 510 520 530 540 

or f 1 . pep GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 



wo 99/24578 



-363- 



PCT/IB98/01665 



orf la 



I I I M I I I I I I I I I I I III I II I I I I I I II II I : I III II III I:: 1:111 II lit 
GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 
780 790 800 810 820 830 



10 



15 



20 



550 560 570 580 590 600 

orf 1 . pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
|||||||||||llltllllllllllt:|lllll:ll:| I II II I I I I II I I I I : II II I 
orf la NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
840 850 860 870 * 880 890 

610 620 630 640 650 660 

orf 1 . pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 
I I I It I I i II I I I t I II I II I t I ::|:lllllltl II II I I I II I I I I I I I I I I I 

orf la NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFNTLTVNG 

900 910 920 930 940 950 

670 680 690 700 710 720 

or f 1 . pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 
III Iltlllllllltlllllllllltllilltllllllllll:lt:|lllllllltlll 
orf la KI^QGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTVVEGKDNKPL 
960 970 980 990 1000 1010 



25 



730 740 750 

orf 1 . pep SENLNFTLQNEHVDAGAW 

I II II I I M I I! I I I I I I 

orf la SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 
1020 1030 1040 1050 1060 1070 



30 orf 1. pep 



orf la LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQTVEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 

35 760 

orf 1. pep LDR 

I I I 

orf la XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 

40 

770 780 790 800 810 820 

or f 1 . pep VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
It I I M I I II I II II I 1 i I I I I I I I I I I I I I I I t I II I I I II I II I II II II I I I I I I 
orf la VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
45 1200 1210 1220 1230 1240 1250 

830 840 850 860 870 880 

orf 1 . pep TFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQA 
:! I II M t I II I I I II I II I I I I II I I II : I II t i I I I I 11 I I I I M I I I I I I I I 
50 orf la XFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVLHYGIQA 

1260 1270 1280 1290 1300 1310 

890 900 910 920 930 940 

orf 1 . pep RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 
55 I I I I I I I I t I ! I I : II I t I { I I I I t I It I I I M I I I I I I I II II I I I I I I II II I t I I I 

orf la RYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHX 
1320 1330 1340 1350 1360 1370 

950 960 970 980 990 1000 

60 orf 1 . pep SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 

tint I I I I I I I I t I I I I t I I I t I t I I II I t I I I I t I I I I t I I I I I II I I I I I I I I I I 
orf la SITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVN7VEIKGETLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 

65 1010 1020 

orf 1 . pep QLEAQHSAGIKLGYRWX 
I I I I I I I t I I I I t f I I I 
o rf la QLEAQHS AGI KLGYRWX 

1440 1450 



70 The complete length ORF 1 a nucleotide sequence <SEQ ED 65 1 > is: 



wo 99/24578 



PCT/IB98/01665 



-364- 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
. 3451 
3501 
3551 



ATGAAAACAA 
AACCGGCCGC 
TCGGCATTCT 
TACCAATACT 
GGCGAAAGAT 
CAATGACAAA 
GTGGCGGCAT 
CGGCTATAAC 
ACCGTTTTTC 
TCACACCCTT 
CACAGATGCA 
ATTCCGATAA 
CACTATTGGC 
CGCATGGTTA 
GCGTANTTAG 
ATGCCGATTG 
TGACAAAACA 
ACCCTTATTC 
TTCTACGATG 
GCGCAGTAAC 
CGGTAACAGA 
ACAGTCCGAC 
TTACGCGGCA 
AAAACCTTTC 
AACATCAACC 
CTCGCCTGAA 
AAGACAGTAC 
TCCAAAATCG 
AGGCTCGATC 
ACGATTUUiGG 
AGGGGTACGG 
CTATTTCGGC 
CGTTCCACCG 
AATGCCACAA 
ACAACCGAGT 
CCTACAACGG 
CTCAACCTTG 
CGGCGGAACA 
TTTTCAGCGG 
TGGTCAAAAA 
CTGGATCNAC 
AGGCGGTGAT 
AGCAATCACG 
AATCTGTACA 
NCATTACCGA 
GGCANTGTNA 
NNCACTNAAN 
GCCACAACGC 
GCAACATTTA 
TGCTTCATTT 
TTTCCGACAA 
GTCTCCCTAG 
CGGACAACTC 
AATGGACGCT 
GCCACCATTA 
AACCGGCAGN 
TATCCGTTAC 
GTAAACGGCA 
CTTCGGCTAC 
CTTACACCTT 
CAATTGACGG 
TAATTTCACC 
AACTCATCCG 
CAAGAGCTTT 
AAAAGACAAC 
CCGCCGAAAA 
GAAAATGTCG 
GGATAAAGAC 
NTACCACCGC 
CCGCAGCCCC 
CCGTTATGCC 
TTTTCGCCGT 



CCGACAAACG 
ATCCGCTTCT 
TCCCCAAGCT 
ATCGCGACTT 
ATTGAGGTNT 
AGCCCCGATG 
TGGTGGGCGA 
AACGTTGATT 
TTACCAAATT 
ACAACGGCGA 
GAACCTGTCG 
AGAAAAATAT 
GTTATGATGA 
ATTGGCGGCA 
TTTGAGCGGC 
CAGGTGCGGC 
AACAATAAAT 
CGGCAGGGAA 
ACATTTACAG 
GGACATTTTT 
AACCAACGAA 
TGTTTGACGA 
GGGGGTGTTA 
TTTTATCGAT 
AAGGCGCGGG 
AACAACGAAA 
CGTTACTTGG 
GCAAAGGCAC 
AGCGTGGGCG 
CAAAAAACAA 
TGCAACTGAA 
TTTCGCGGCG 
TATTCA/^T 
CAACATCCAC 
GGTAAGAATA 
TTGGTTTGGC 
TTTACCAGCC 
AATTTAAACG 
CAGACCGACA 
TGGAAGGTAT 
CGCACGTTTA 
TTCCCGCAAT 
CCCAAGCAGT 
CGTTCGGACT 
CGATAAAGTG 
GNCTNNCCNA 
GGCAATCTTA 
CACCCAAAAC 
ATCAAGCCAC 
AATCTAAGCA 
CGCTAAGGCA 
CCGATAAGGC 
AGCGGCAGCA 
GCCGTCAGGC 
CACTCAATTC 
GTGTCAGACA 
ACCGCCAACT 
AATTGAACNG 
CGAAGCGACA 
GGCGGTCAAC 
TAGTGGAAGG 
CTGCAAAACG 
CAAAGACGGC 
CCGACAAACT 
GCGCAAAGCC 
GACAGAAAGC 
GCATTATGCA 
AGCGCNTTGG 
CTTCCCCCGC 
AACCGCAACC 
AATAGCGGTT 
ACAGGACGAA 



GACAACCGAA 
CGCCTGCTTA 
TGGGCGGGAC 
TGCCGAAAAT 
ACAACAAAAA 
ATTGATTTTT 
TCAATATATT 
TTGGTGCGGA 
GTGAAAAGAA 
TTANCATATG 
AAATGACGAG 
CCCGAGCGTG 
TGACAAACAC 
ATACACATAT 
GATGTGCGCC 
AGGCGACAGC 
GGCTGCTCAA 
AACGGTTTCC 
AGGCGATACA 
CCTTTACATC 
AAGGTNTCCA 
ATCTTTGAAT 
ATCAGTACCG 
TACGGCAACG 
CGGTTTGTAT 
CGTGGCAAGG 
T^GTAAACG 
GCTGCACGTT 
ACGGTACAGT 
GCCTTTAGTG 
TGCCGATAAT 
GACGTTTGGA 
ACCGATGAAG 
CGTTACCATT 
TCTVATAGACT 
GAGAAAGATA 
CGCCGCAGAA 
GCAACATCAC 
CCGCACGCCT 
CCCACAAGGA 
AAGCGGAAAA 
GTTGCCAAAG 
TTTTGGTGTC 
GGACNGGTCT 
ATTGCTTCAT 
TNACGNTNNT 
GTGCAAATGG 
GGCAACCTTA 
ATTAAACGGC 
ACAACGCCGC 
AACGTAAGCC 
AGTATTCCAT 
AGGANACAGC 
ACGGAATTAG 
CGCCTATCGC 
CGCCGCGCCG 
TCGGTAGAAT 
TCAAGGAACA 
AATTGAAGCT 
AATACCGGCA 
GAAAGACAAC 
AACACGTCGA 
GAGTTCCGCC 
CGGCAAGGCA 
TTGACGCGCT 
GTTGCCGAAC 
GGCGGAGGAA 
CGAAACAGCG 
GCCCGCNGCG 
TCAACCCCAA 
TGAGTGAATT 
TTGGACCGCG 



ACACACCGCA 
CTTAGCCATA 
ACACTTATTT 
AAAGGCAAGT 
AGGGGAGTTG 
CTGTGGTGTC 
GTGAGCGTGG 
AGGAAGNAAT 
ATAATTATAA 
CCGCGTTTGC 
TGACATGAGG 
TCCGCATCGG 
GGCGATTTAT 
GCAGGGTTGG 
ATGCCAACGA 
GGTTCGCCAA 
CGGAGTTTTA 
AGCTGATACG 
CATACCGTCT 
CAACAACAAC 
ATCCAAAGCT 
GAAACTGATA 
TCCAAGGTTA 
GCAAACTCAT 
TTTGAAGGTG 
CGCGGGCGTT 
GCGTGGCAAA 
CAAGCCAAAG 
CATTTTGGAT 
AAATCGGCTT 
CAGTTCAACC 
TTTAAACGGG 
GGGCGATGAT 
ACAGGGAATG 
TAATTACAGC 
CGACCAAAAC 
GACCGCACCC 
GCAAACAAAC 
ACAATCATTT 
GAAATCGTGT 
TTTCCATATT 
TGGAAGGCGA 
GCACCGCATC 
GACAAATTGT 
TGACTAAGAC 
TNAAANCTCN 
CGATACACGT 
GCCTCGTGGG 
AACNCATCGG 
ACAAAACGGC 
ATTCCGCACT 
TTTGAAAACA 
ATTACACTTA 
GCAATTTAAA 
CACGATGCTG 
CCGTTCGCGC 
CCCGTTTCAA 
TTCCGCTTTA 
GGCGGAAAGT 
ACGAACCCGT 
AAACCGCTGT 
TGCCGGCGCG 
TGCATAATCC 
GAAGCCAAAA 
GATTGCGGCC 
CGGCCCGGCN 
GAGAAAAAAC 
CGAAGCGGAA 
CCCGCCGGGA 
CCGCAGCGCG 
TTCCGCCACG 
TGTTTGCCGA 



AAGCCCCGAA 
TGCCTGTCGT 
CGGCATCAAC 
TTGCAGTCGG 
GTCGGCAAAT 
GCGTAACGGC 
CACATAACGG 
CCCGATCAGC 
GCCTGACAAT 
ATAAATTTGT 
GGGAATACCT 
CTCAGGACAC 
CCTACTCCGG 
GGAAATAATG 
CTATGGCCCT 
TGTTTATTTA 
CAAACCGGCT 
CAAAGATTGG 
NTTTTGAACC 
GGTACGGGTA 
TAAAGTACAG 
AAGAACCAGT 
AACAACGGTG 
CTTATCAAAC 
ATTTTACGGT 
CATATCAGTG 
CGACCGCCTG 
GGGAAAACCA 
CAGCAGGCAG 
GNTCAGCGGC 
CCGACAAACT 
CATTCGCTTT 
TGNCNATCAT 
AAAGTATTAC 
AAAGAAATTG 
GAACGGGCGG 
NGCTGCTTTC 
GGCAAACTGT 
AGGAAGCGGG 
GGGACAACGA 
CAGGGCGGGC 
TTGNCATTTG 
AAAGCCATAC 
GTCGAANAAA 
NGACNTNAGC 
CNGGGCNTGC 
TATACAGTCA 
CAATGCCCAA 
NTTCGGGCAA 
AGTCTGACGC 
CAACGGCAAT 
GCCGCTTTAC 
AAAGACAGCG 
CCTTGACAAC 
CAGGCGCGCA 
CGTTCCCTAT 
CACGCTGACG 
TGTCGGAACT 
TCCGAAGGNA 
AAGCCTCGAT 
CCGAAAACCT 
TGGCGTTACC 
GGTCAAAGAA 
AACAGGCGGA 
GGGCGCGATG 
GGCAGGCGGG 
GGGTGCAGGC 
ACCCGGCCGG 
TTTGCCGCAA 
ACCTGATNAG 
CTCAACAGCG 
AGACCGCCGC 
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3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 

3701 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 

3751 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 

3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 

3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 

3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 

3951 CGGTTTCGGC GGATTCGGCA TCGAACCGTA CATCGGCGCA ACGCGCTATT 

4001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 

4051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG ATTATTCATT 

4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 

4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 

4201 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 

4251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 

4301 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ED 652>: 



1 MKTTDKRTTE THRKAPKTGR 

51 YQYYRDFAEN KGKFAVGAKD 

101 VAALVGDQYI VSVAHNGGYN 

151 SHPYNGDXHM PRLHKFVTDA 

201 HYWRYDDDKH GDLSYSGAWL 

251 MPIAGAAGDS GSPMFIYDKT 

301 FYDDIYRGDT HTVXFEPRSN 

351 TVRLFDESLN ETDKEPVYAA 

401 NINQGAGGLY FEGDFTVSPE 

451 SKIGKGTLHV QAKGENQGSI 

501 RGTVQLNADN QFNPDKLYFG 

551 NATTTSTVTI TGNESITQPS 

601 LNLVYQPAAE DRTXLLSGGT 

651 WSKMEGIPQG EIVWDNDWIX 

701 SNHAQAVFGV APHQSHTICT 

751 GXVXLXXXXX XXLXGXAXLX 

801 ATFNQATLNG NXSXSGNASF 

851 VSLADKAVFH FENSRFTGQL 

901 ATITLNSAYR HDTUVGAQTGX 

951 VNGKLNXQGT FRFMSELFGY 

iOOl QLTWEGKDN KPLSENLNFT 

1051 QELSDKLGKA EAKKQAEKDN 

1101 ENVGIMQAEE EKKRVQADKD 

1151 PQPQPQPQPQ PQRDLXSRYA 

1201 NAVWTSXIRX TKHYRSQDFR 

1251 TENXFDDGIG NSARLAHGAV 

1301 KIRRRVLHYG IQARYRAGFG 

1351 GLAFNRYRAG IKADYSFKPA 

1401 AQDFGKTRSA EWGVNAEIKG 

A transmembrane region is imderlined. 



IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 
lEVYNKKGEL VGKSMTKAPM IDFSWSRNG 
NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 
EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 
IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 
NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 
GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ 
GGVNQYRPRL NNGENLSFID YGNGKLILSN 
NNETWQGAGV HISEDSTVTW KVNGVANDRL 
SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 
FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 
GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 
NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 
RTFKAENFHI QGGQAVISRN VAKVEGDXHL 
RSDWTGLTNC VEXXITDDKV lASLTKTDXS 
GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 
NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 
SGSKXTALHL KDSEWTLPSG TELGNLNLDN 
VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 
RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 
LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE 
AQSLDALIAA GRDAAEKTES VAEPARXAGG 
SALAKQREAE TRPXTTAFPR ARXARRDLPQ 
NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 
AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 
FGQYGIGRFD IGISTGAGFS SGXLSDGIGG 
GFGIEPYIGA TRYFVQKADY RYENVNIATP 
QHXSITPYXS LSYTDAASGK VRTRVNTAVL 
FTLSXHAAAA KGPQLEAQHS AGIBCLGYRW* 



ORFl-1 shows 86.3% identity over a I462aa overlap with ORFla: 

10 20 30 40 50 60 

orf la . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGIN YQYYRDFAEN 

I I I I I M M I I I I I I I i I I I I I I i I I I I i t I I t t I I I I I I I It M I I I I I I t I 1 I I I I i I 
orf 1-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf la . pep KGKFAVGAKD lEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQY I VSVAHNGGYN 

1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i n 1 1 1 M i 1 1 1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 

orf 1-1 KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf la . pep NVDFGAEGXN PDQHRFSYQIVKRNNYK PONS -HPYNGDXHMPRLHKFVTDAEPVEMTS DM 

Itilllll llllin:l:||ltlltl :: lll:lt I I i I I M M t I I I I I t I I I I 
orf 1-1 NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
130 140 150 160 170 180 
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900 910 920 930 940 
orf la . pep TELGNLNLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFN 

I t I I I I I t t I M I I I I I I I I I I I f 1 I I I I :: I : t I I I I I I I I I t I I I t t I I I I M t 
orf 1-1 TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 

900 910 920 930 940 950 



950 960 970 980 990 1000 

or f la . pep TLTVNGKLNXQGTFRFMSELFGYRSDKLKIAESSEGTYTIAVNNTGNEPVSLDQLTVVEG 

I I I I I I I I I I I I M I I I I M I I I I I M I M I I I I I I I I I I M I I I I I i : t I : I I I I t I I 
orf 1-1 TLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEG 

960 970 980 990 1000 1010 



1010 1020 1030 1040 1050 1060 

or f la . pep KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 
t I I I I t I I I I I I I I t I I t I I I I i I I I I I I I I I I I I I ) I I I I I I I M I I I t I I i I I I I I I I 
orf 1-1 KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGBCAEAKKQAE 

1020 1030 1040 1050 1060 1070 



1070 1080 1090 1100 1110 1120 

or f la . pep KDNAQSLDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQR 
I I I M I I I I I I I I I I M : I I I I I I I i I t I I I I t M I I i I I t I t I I I I I I I I I : I I t I I I 
orf 1-1 KDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQADKDTALAKQR 

1080 1090 1100 1110 1120 1130 



1130 1140 1150 1160 1170 1180 

orf la . pep EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 
i I I I I I I I I I I I I I I I I N I I I I I I I I M nil I I t I I I I I I 1) i I i I t i I I I I 
orf 1-1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP— QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 



1190 1200 1210 1220 1230 1240 

or f la . pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
{ I I I t I I i i I I I I I t I I I I It I t I I II 1 I II M I t II t II II I t II I t I II I t I M I I 
orf 1-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRCXyrDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 



1250 1260 1270 1280 1290 1300 

orf la . pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
llllll:|[||||||llltllllllllll 11 lllhlllllll IMIIIIIIIllll 
orf 1-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 



1310 1320 1330 1340 1350 1360 

orf la . pep HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
II II II M I I II i I i II I t: I I I I I I I I II I I I II I I I II I tl t II I I M I I n II M I I 
orf 1-1 HYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
1320 1330 1340 1350 1360 1370 



1370 1380 1390 1400 1410 1420 

or f la . pep KPAQHXSITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 
I II I I I II I I I I II II I I II I I I I I I II I I I II I I I II I I II I t I I i II I II II t II 
orf 1-1 KPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHA 
1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orf la . pep AAAKGPQLEAQHSAGIKLGYRWX 
I I M I I I I I II II t I I II I I I I I 
orf 1-1 AAAKGPQLEAQHSAGIKLGYRWX 
1440 1450 

Homology with adhesion and penetration protein hap precursor of HJnfluenzae (accession number P45387) 
Amino acids 23-423 of ORFl show 59% aa identity with hap protein in 450aa ovCTlap: 



orfl 23 FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 

F +L C+S GI QAWAGHTYFGI+YQYYRDFAENKGKF VGAK+IEVYNK+G+LVG 
hap 6 FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYKKEGQLVG 65 

orfl 83 KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 

SMTKAPMIDFSWSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 
hap 66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSV/VHNGGYNDVDFGAEGRN-PDQHRFTYQXV 124 



wo 99/24578 



-368- 



PCTAB98/01665 



orfl 143 KRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSYMDGRKYIDQNNYPDRVRIGAGR 202 

KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 
hap 125 KRNNYQAWERKHPYDGDYHMPRLHKFVTEAEPVGMTTNMDGKVYADRENYPERVRIGSGR 184 

orfl 203 QYWRSDEDEPNNRESSYHIA 222 

QYWR+D+DE N SSY+++ 
hap 185 QYWRTDKDEETNVHSSYYVSGAYRYLTAGNTHTQSGNGNGTVNLSGNWSPNHYGPLPTG 244 

10 orfl 223 SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRKDWFYDEIFAGDTHSVF 277 

SGSPMFIYDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 
hap 245 GSKGDSGSPMFIYDAKKKQWLINAVLQTGHPFFGRGNGFQLIREEWFYNEVLAVDTPSVF 304 

orfl 278 — YEPRQNGKYSFNDDNNGTGKIN-AKHEHNSLPNRLKTRTVQLFNVSLSETAREPVYHA 334 
15 Y P NG YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 

hap 305 QRYIPPINGHYSFVSNNDGTGKLTLTRPSKDGSKAKSEVGTVKLFNPSLNQTAKEHV-KA 363 

orfl 335 AGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFTV-SPENNETWQGA 393 
A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 
20 hap 364 AAGYNIYQPRMEYGKNIYLGDQGKGTLTIENNINQGAGGLYFEGNFWKGKQNNITWQGA 423 

orfl 394 GVHISEDSTVTWKVNGVTUJDRLSKIGKGTL 423 

GV I +D+TV WKV+ NDRLSKIG GTL 
hap 424 GVSIGQDATVEWKVHNPENDRLSKIGIGTL 453 

25 Amino acids 715-101 1 of ORFl show 50% aa identity with hap protein in 258aa overlap: 

Orfl 41 DTRYTVSHNATQ-NGNXSLVXNAQATFNQ-ATLNGNTSASGNASFNLSDHAVQNGSLTLS 98 

DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 
hap 733 DTKVINSIPITQINGSINLTNNATVNIHGLAKLNGNVTLIDHSQFTLSNNATQTGNIKLS 792 

30 orfl 99 GNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 158 

+A A V+++ LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 
hap 793 NHANATVNNATLNGNVHLTDSAQFSLKNSHFWHQIQGDKDTTVTLENATWTMPSDTTLQN 852 

orfl 159 LNLDNATITLNSAYRHDAAGAQTGSATDAPXXXXXXXXXXLLXVTPPTSVESRFNTLTVN 218 
35 L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

hap 853 LTLNNSTVTLNSAY SASSNNAPRHRRS LETETTPTSAEHRFNTLTVN 899 

orfl 219 GKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKP 278 
GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 
40 hap 900 GKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYTLSVRNTGKEPVTLEQLTLIESLDNKP 959 

orfl 279 LSENLNFTLQNEHVDAGA 296 

LS+ L FTL+N+HVDAGA 
hap 960 LSDKLKFTLENDHVDAGA 977 

45 Amino acids 1 192-1450 of ORFl show 41% aa idoitity with hap protein in 259aa overlap: 

Orfl 1 LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 

LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 
hap 1135 LDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQKTNLRQIGVQKALANGRIGAVFSHSR 1194 

50 orfl 61 TENTFDDGIGNSARLAHGAVFGQYGIDRFYXXXXXXXXXXXXXXXXXIGXKXRRRVLHYG 120 

++NTFD+ + NAL+FQY KR+ ++YG 

hap 1195 SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 1254 

orfl 121 IQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPA 180 
55 + A Y+ G GI+P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 

hap 1255 VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 1314 

orfl 181 QHISITPYLSLSYTD7VASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAA 240 
+IS+ PY ++Y D ++ V+T VN VL Q FG+ E G+ AEI F +S + + 
60 hap 1315 DNISVKPYFFVNYVDVSNANVOTTVNLTVLQQPFGRYWQKEVGLKAEILHFQISAFISKS 1374 

orfl 241 KGPQLEAQHSAGIKLGYRW 259 

+G QL Q + G+KLGYRW 
hap 1375 QGSQLGKQQNVGVKLGYRW 1393 

65 
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Homology with a predicted ORF from N.gonorrhoeae 

The blocks of ORFl show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N.gonorrhoeae: 
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orf l.pep 
orf Ing 
orf l.pep 
orf Ing 
orfl -pep 
orf Ing 
orf l.pep 
orf Ing 
orfl . pep 
orf Ing 
orfl .pep 
orf Ing 
orf 1 .pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf 1 .pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf 1 .pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 



MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
I I I I It I I t I I I t I I I I I I I I I I I i I I t I I I t I I I t I I i I I I I f I I i I i I I t I I I I I 
MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 



VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 
I I I I I I i I i I I I I I i I I I I I I 1 I I I I i I I I I I I I I i I i It : M I I I I I I I I I I I i I [ i I I 
VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 
I : I : I I I I t : t I I I M I I I I I I I : I I 1 t I i I I I I I I I I i I I I I I I I 
FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 

// 

DKVTASLTKTDISGNVDLADHAHLNLTGLA 
III 111:111: I II : II I I I I I I I I I I I 
FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSIiADHAHLNLTGLA 



60 



60 



120 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
I II I I t I II I t II t I I I t II I I I I I I I II I I I I I I ) t I I II I I I : t I I I I II t I t II I I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 
I I I I M II I II I : I: I II I i I I I I I I: I I I I II t I I I I I I I I II I I I I I I I I I I 
NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 179 

MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIAS 223 

III II I I : I I I I t 1 I I I I I I II I I I II I I I II I I I I I I I I 

MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 239 

GS PMFI YDA QKQKWLIN GVLOTGNPYIGKSNG 255 

I II II I II I I I I I I I I I I I I t I I I I M I I I M 
GGTVNLGSEKIKHSP YGFLPTGGSFGDSGSPMFIYDA QKQKWLIN GVLOTGNPYIGKSNG 289 

FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 
I I I t I I I I I I I I I I I I I I i I I I I II : I I I I I I I I : I I i : I I I : I M : I III I I I I I I 
FOLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIPAKHKHYSLPYRLKTRT 359 



375 



422 



479 



744 



774 



803 



TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 
i : I I I I : :: : I I : I I It I I I III I I I M 1 I I I I I I I I II t I I I I I I I I:: I 
TFNGNL- VQAETRT IRLRANATQNGNLSLVGNAQAT FNQATLNGNT SAS DNAS FNLSNNA 833 

VQNGSLTLSGNAICANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKE>TALHLKDSEWT 8 63 
I I I I I I II I I I I I t t I t I t I I I I I t I I I I I I I I I I : I I I I I : I I II i I I I I I I I I I I I I 
VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 893 

LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 923 
lltl:IIIIIIIIIIIIIIIIIIIIIIIIIIIIII:|MIIIIIII II llltl|:| 
LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 
I I I I I I I t I I I I I I I I I I I I t I I I I I I I I llllllllllllllllllllllttlttlll 
SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 
I t t I I I I t I I M I 1 I I I I I I I I I I I 1 I 

WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

// 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 
I I I I t i I I I I I I I I I I I I I i I I I I I I I I I I 
PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 
I I I I I I I I I I I I I 1 I I I I I 1 I I I I I I t I I I I I I t I I I I I I I I I I I I I I I I I I I I I II 
AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 1299 
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orf 1 . pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

I I I I i I i I I I I M I I i I I i I I ) I t I I I I t I I I I I I I I I t I i I I I I I I I I I M I t I I I 
orflng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1359 

orf 1. pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1391 

I I I t I I I I I I I I I I I I I I I I M I I I f I i I I I I I I I t I I I I I I I I I t I t M I M I I I I I t I 
orflng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1419 

or f 1 . pep AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1440 

IIIIIMMIIIIIIIIIIIIIIIIIIIilinilllltltllMltli 
orflng AQDFGEeTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 68 

The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 



1 


ATGAAAACAA 


51 


AACCGGCCGC 


101 


TCGGCATTCT 


151 


TACCAATACT 


201 


GGCGAAAGAT 


251 


CGATGACGAA 


301 


GTGGCGGCAT 


351 


CGGCTATAAC 


401 


ACCGCTTTTC 


451 


AACGGCCATC 


501 


TGTCACAGAT 


551 


AATACGCTGA 


601 


AGACAATATT 


651 


ATATCATATT 


701 


CACAAAATGG 


751 


AAACATAGCC 


801 


TGGCTCACCA 


851 


ATGGGGTATT 


901 


CAGCTAGTTC 


951 


CCATTCAGTA 


1001 


ACAATAATAA 


1051 


CTACCTTATA 


1101 


ATCCGAGACA 


1151 


GTTATCGACC 


1201 


GGAAAAGGTG 


1251 


TTTGTATTTT 


1301 


GGCAAGGCGC 


1351 


GTAAACGGCG 


1401 


GCTGGTTCAA 


1451 


GTAAAGTCAT 


1501 


TTTAGTGAAA 


1551 


CGATAATCAG 


1601 


GTTTGGATTT 


1651 


GATGAAGGGG 


1701 


TACCATTACA 


1751 


TGGATAGCAA 


1801 


GCAACCAAAA 


1851 


GGATCGCACT 


1901 


CGCAAACAAA 


1951 


TACAATCATT 


2001 


AGAAATCGTG 


2051 


ACTTCCATAT 


2101 


GTGGAAGGCG 


2151 


CGCACCGCAT 


2201 


TGACAAGTTG 


2251 


TTGAGCAAGA 


2301 


TTTAAATCTC 


2351 


GAGACACGCA 


2401 


AGCCTCGTGG 


2451 


CAACACATCG 


2501 


TACAAAACGG 


2551 


CATTCCGCAC 


2601 


TTTTGAAAAC 


2651 


CATTACACTT 


2701 


GGCAATTTAA 


2751 


ACACGATGCG 


2801 


GCCGTTCGCG 


2851 


TCCCGTTTCA 



CCGACAAACG 
ATCCGCTTCT 
GCCCCAAGCC 
ATCGCGACTT 
ATTGAGGTTT 
AGCCCCGATG 
TGGCGGGCGA 
AATGTTGATT 
TTACCAAATT 
CTTATGGCGG 
GCAGTVACCTG 
TTTAAATAAA 
GGCGGTCTGA 
GCAAGCGCAT 
ATCAGGTGGT 
CATATGGTTT 
ATGTTTATCT 
GCAAACAGGC 
GTAAAGATTG 
TTCTACGAAC 
TGGCGCAGGA 
GATTAAAAAC 
GCAAGAGAAC 
CAGACTGAAT 
AATTGATACT 
GAGGGTAATT 
GGGCGTTCAT 
TGGCAAACGA 
GCCAAAGGGG 
CTTAGATCAG 
TCGGCTTGGT 
TTCAACCCCG 
GAACGGGCAT 
CGATGATTGT 
GGCAATAAAG 
AAAAGAAATT 
CGAACGGGCG 
TTACTGCTTT 
CGGCAAACTG 
TAGGAAGCGG 
TGGGACAACG 
TCAGGGCGGA 
ATTGGCATTT 
CAAAGCCACA 
TACCGAAAAA 
CCGACATCAG 
ACAGGACTTG 
CTATACGGTT 
GCAATGCCCA 
GCTTCGGACA 
CAGTCTGACG 
TCAACGGCAA 
AGCCGCTTTA 
AAAAGACAGC 
ACCTTGACAA 
GCAGGCGCGC 
CCGTTCCCTA 
ACACGCTGAC 



GACAACCGAA 
CGCCCGCTTA 
CGGGCGGGAC 
TGCCGAAAAT 
ACAACAAAAA 
ATTGATTTTT 
TCAATATATT 
TTGGTGCGGA 
GTGAAAAGAA 
CGATTATCAT 
TTGAGATGAC 
TACCCTGATC 
TGAAGACGAA 
ATTCTTGGCT 
GGCACAGTCA 
TTTACCAACA 
ATGATGCCCA 
AACCCCTATA 
GTTCTATGAT 
CACATCAAAA 
AAAATCGATG 
ACGAACCGTT 
CTGTTTATCA 
AATGGAGAAA 
TACCAGCAAC 
TTACGGTCTC 
ATCAGTGATG 
CCGCCTGTCC 
AAAACCAAGG 
CAGGCGGACG 
CAGCGGCAGG 
ACAAACTCTA 
TCGCTTTCGT 
CAACCACAAT 
ATATTACTAC 
GCCTACAACG 
GCTCAATCTG 
CCGGCGGAAC 
TTTTTCAGCG 
GTGGTCAAAA 
ATTGGATCGA 
CAAGCGGTGG 
AAGCAATCAC 
CAATCTGTAC 
ACCATTACCG 
AGGCAATGTC 
CCACACTCAA 
ACGCGCAACG 
AGCAACATTT 
ATGCTTCATT 
CTTTCCGACA 
TGTCTCCCTA 
CCGGAAAAAT 
GAATGGACGC 
CGCCACCATT 
AAACCGGCAG 
TTATCCGTTA 
GGTAAACGGC 



ACACACCGCA 
CTTAGCCATA 
ACACTTATTT 
AAAGGCAAGT 
AGGGGAGTTG 
CTGTGGTATC 
GTGAGCGTGG 
GGGAAGCAAT 
ATAATTATAA 
ATGCCGCGTT 
CAGTTATATG 
GTGTTCGAAT 
CCCAATAACC 
CGTCGGTGGC 
ACTTAGGTAG 
GGAGGCTCAT 
AAAGCAAAAG 
TAGG/^AAAAG 
GAAATCTTTG 
TGGGAAATAC 
CCAAACATAA 
CAATTGTTTA 
TGCTGCAGGT 
ATATTTCCTT 
ATCAACCAAG 
GCCTAAAAAC 
GCAGTACCGT 
AAAATCGGCA 
CTCGGTCAGC 
ATCAAGGCAA 
GGGACGGTGC 
TTTCGGCTTT 
TCCACCGCAT 
CAAGACAAAG 
AACCGGCAAT 
GTTGGTTTGG 
AATTACCAAC 
AAATTTAAAC 
GCAGACCGAC 
ATGGAAGGTA 
CCGCACATTT 
TTTCCCGCAA 
GCCCAAGCAG 
ACGTTCGGAC 
ACGATAAAGT 
AGCCTTGCCG 
CGGCAATCTT 
CCACCCAAAA 
AATCAAGCCA 
TAATCTAAGC 
ACGCTAAGGC 
GCCGATAAGG 
CAGCGGCGGC 
TGCCGTCGGG 
ACACTCAATT 
TGCGGCAGAT 
CGCCGCCAAC 
AAATTGAACG 



AAGCCCCTAA 
TGCCTGTCGT 
CGGCATCAAC 
TTGCAGTCGG 
GTCGGCAAAT 
GCGTAACGGC 
CACATAACGG 
CCCGATCAGC 
AGCAGGGACT 
TGCACAAATT 
GATGGGTGGA 
CGGAGCAGGC 
GCGAAAGTTC 
AATACCTTTG 
CGAAAAAATT 
TTGGCGACAG 
TGGTTAATTA 
CAATGGCTTC 
CTGGAGATAC 
TTTTTTAACG 
ACACTATTCT 
ATGTTTCTTT 
GGGGTCAACA 
TATTGACAAA 
GCGCGGGCGG 
AACGAAACGT 
TACTTGGAAA 
AAGGCACGCT 
GTGGGCGACG 
AAAACAAGCC 
AACTGAATGC 
CGCGGCGGAC 
TCAAAATACC 
AATCCACCGT 
AACAACAACT 
CGAGAAAGAT 
CGGAAGAAGC 
GGCT^TATCA 
ACCGCACGCC 
TCCCACAAGG 
AAAGCGGAAA 
TGTTGCCAAA 
TTTTCGGTGT 
TGGACGGGTC 
GATTGCTTCA 
ATCACGCTCA 
AGTGCAGGCG 
CGGCAACCTC 
CATTAAACGG 
AACAACGCCG 
AAACGTAAGC 
CAGTATTCCA 
AAGGATACGG 
CACGGAATTA 
CCGCCTATCG 
GCGCCGCGCC 
TTCGGCAGAA 
GTCAGGGAAC 
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2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3B01 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 



AT.TCCGCTTT 
TGGCGGAAAG 
AACGAACCCG 
CACACCGCTG 
atgccggcgc 
CTGCATAATC 
gggagaaACA 
AAcaacaggc 
gCcgggcgca 
GCAGGCAGGC 
AACGGGTGCA 
GAAACCCGGC 
GGATTTGCCG 
TGATCAGCCG 
AACAGCGTTT 
CCGCCGCAAC 
GTTCGCAAGA 
GGTATGCAGA 
CAACCGGACC 
TTGCCCACGG 
GGCATCAGCG 
CAGAGGCAAA 
ACCGCGCAGG 
CGCTATTTCG 
CACCCCGGGC 
ATTCATTCAA 
TCCTATACCG 
CGTATTGGCG 
ACGCCGAAAT 
GGGCCGCAAT 
CTGGTAA 



ATGTCGGAAC 
TTCCGAAGGC 
TAAGTCTCGA 
TCCGAAAATC 
atggCGTTAT 
CGGTCAAAGA 
GAggccgccT 
ggaaaAAGAG 
atgccaccga 
GGGGAAAAtg 
GGCGGATAAA 
CGGCTACCAC 
CAACCGCAGC 
TTATGCCAAT 
TCGCCGTACA 
GCCGTTTGGA 
TTTCCGCGCC 
AAAACCTCGG 
GGAAACACCT 
TGCCGTTTTC 
CGGGCGCGGG 
ATCCGCCGCC 
TTTCGGCGGA 
TCCAAAAAGC 
CTTGCATTCA 
ACCGGCGCAA 
ATGCCGCTTC 
CAGGATTTCG 
CAAAGGTTTC 
TGGAAGCGCA 



TCTTCGGCTA 
ACTTACACCT 
GCAATTGACG 
TTAATTTCAC 
CAGCTTATCC 
ACAAGAGCTT 
TGACGGCAAA 
AACgcgcaaa 
AAAGGCAgaa 
ccgGCATTAT 
GACACCGCCT 
CGCCTTCCCC 
CCCAACCGCA 
AGCGGTTTGA 
GGACGAATTG 
CAAGCGGCAT 
TACCGCCAAC 
CAGCGGGCGC 
TCGACGACGG 
GGGCAATACG 
TTTTAGTAGC 
GCGTGCTGCA 
TTCGGCATCG 
GGATTACCGA 
ACCGCTACCG 
CACATTTCCA 
CGGCAAAGTC 
GCAAAACCCG 
ACGCTGTCCC 
GCACAGCGCG 



CCGCAGCGGC 
TGGCTGTCAA 
GTAGTGGAAG 
CCTGCaaaAc 
gcaaagacgG 
TCCGACAAAC 
ACAGGCacaA 
gccttgAcgc 
agtgttgccg 
GCAGGCGGAG 
TGGCGA7VACA 
CGCGCCCGCC 
ACCCCAACCG 
GTGAATTTTC 
GACCGCGTGT 
CCGGGACACC 
AAACCGACCT 
GTCGGCATCC 
CATCGGCAAC 
GCATCGGCAG 
GGCAGCCTTT 
TTACGGCATT 
AACCGCACAT 
TACGAAAACG 
CGCGGGCATT 
TCACGCCTTA 
CGAACGCGCG 
CAGTGCGGAA 
TCCACGCTGC 
GGCATCAAAT 



AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
gaacacgtcg 
CGAGTTCCgc 
TCGGCAAGgc 
CTTGCCGCCA 
gctgattgcg 
aaccgGCCCG 
GAAGAG7\AAA 
GCGCGAAGCG 
GCGCCCGCCG 
CAGCGCGACC 
CGCCACGCTC 
TTGCCGAAGA 
AAACACTACC 
GCGCCAAATC 
TGTTTTCGCA 
TCGGCACGGC 
GTTCGACATC 
CAGACGGCAT 
CAGGCAAGAT 
CGGCGCAACG 
TCAATATCGC 
AAGGCAGATT 
TTTGAGCCTG 
TCTy^TACCGC 
TGGGGCGTAA 
CGCCGCCAAG 
TAGGCTACCG 



This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 
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1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



MKTTDKRTTE 
YQYYRDFAEN 
VAALAGDQYI 
NGHPYGGDYH 
RQYWRSDEDE 
KHSPYGFLPT 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
MPRLHKFVTD 
PNNRESSYHI 
GGSFGDSGSP 



IRFSPAYLAI 
lEVYNKKGEL 
NVDFGAEGSN 
AEPVEMTSYM 
ASAYSWLVGG 
MFIYDAQKQK 



CLSFGILPQA 
VGKSMTKAPM 
PDQHRFSYQI 
DGWKYADLNK 
NTFAQNGSGG 
WLINGVLQTG 



RAGHTYFGIN 
IDFSWSRNG 
VKRNNYKAGT 
YPDRVRIGAG 
GTVNLGSEKI 
NPYIGKSNGF 



QLVRKDWFYD 
LPYRLKTRTV 
GKGELILTSN 
VNGVANDRLS 
FSEIGLVSGR 
DEGAMIVNHN 
ATKTNGGLKL 
YNHLGSGWSK 
VEGDWHLSNH 
LSKTDVRGNV 
SLVGNAQATF 
HSALNGNVSL 
GNLNLDNATI 
SRFNTLTVNG 
NEPVSLEQLT 
LHNPVKEQEL 
AGRNATEKAE 
ETRPATTAFP 
NSVFAVQDEL 
GMQKNLGSGR 
GISAGAGFSS 
RYFVQKADYR 
SYTDAASGKV 
GPQLEAQHSA 



EXFAGDTHSV 
QLFNVSLSET 
INQGAGGLYF 
KIGKGTLLVQ 
GTVQLNADNQ 
QDKESTVTIT 
NYPPEEADRT 
MEGIPQGEIV 
AQAVFGVAPH 
SLADHAHLNL 
NQATLNGNTS 
ADKAVFHFEN 
TLNSAYRHDA 
KLNGQGTFRF 
WEGKDNTPL 
SDKLGKAGET 
SVAEPARQAG 
RARRARRDLP 
DRVFAEDRRN 
VGILFSHNRT 
GSLSDGIRGK 
YENVNIATPG 
RTRVNTAVLA 
GIKLGYRW* 



FYEPHQNGKY 
AREPVYHAAG 
EGNFTVSPKN 
AKGENQGSVS 
FNPDKLYFGF 
GNKDITTTGN 
LLLSGGTNLN 
WDNDWIDRTF 
QSHTICTRSD 
TGLATFNGNL 
ASDNASFNLS 
SRFTGKISGG 
AGAQTGSAAD 
MSELFGYRSG 
SENLNFTLQN 
EAALTAKQAQ 
GENAGIMQAE 
QPQPQPQPQP 
AVWTSGIRDT 
GNTFDDGIGN 
IRRRVLHYGI 
LAFNRYRAGI 
QDFGKTRSAE 



FFNDNNNGAG 
GVNSYRPRLN 
NETWQGAGVH 
VGDGKVILDQ 
RGGRLDLNGH 
NNNLDSKKEI 
GNITQTNGKL 
KAENFHIQGG 
WTGLTSCTEK 
VQAETRTIRL 
NNAVQNGSLT 
KDTALHLKDS 
APRRRSRRSL 
KLKLAESSEG 
EHVDAGAWRY 
LAAKQQAEKD 
EEKKRVQADK 
QRDLISRYAN 
KHYRSQDFRA 
SARLAHGAVF 
QARYRAGFGG 
KADYSFKPAQ 
WGVNAEIKGF 



KIDAKHKHYS 
NGENISFIDK 
ISDGSTVTWK 
QADDQGKKQA 
SLSFHRIQNT 
AYNGWFGEKD 
FFSGRPTPHA 
QAWSRNVAK 
TITDDKVIAS 
RANATQNGNL 
LSDNAKANVS 
EWTLPSGTEL 
LSVTPPTSAE 
TYTLAVNNTG 
QLIRKDGEFR 
NAQSLDALIA 
DTALAKQREA 
SGLSEFSATL 
YRQQTDLRQI 
GQYGIGRFDI 
FGIEPHIGAT 
HISITPYLSL 
TLSLHAAAAK 



65 



Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 

J 

ORFl-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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730 740 750 760 770 780 

orf 1-1 . pep QSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 
Illtlllllllilll:|:lltllllllllll:llll M I : I I I I t I t t I I I I M I I I M 
orflng-1 QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 
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35 



40 



orf 1-1. pep 
orflng-1 

orf 1-1. pep 
orflng-1 

orf 1-1. pep 
orflng-1 

orf 1-1 .pep 
orflng-1 

orf 1-1. pep 
orflng-1 

orf 1-1. pep 
orflng-1 



790 800 810 820 830 840 

SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 
lt:|||:|||::|llllllllllltlll[illlllllt]l)l I M I I I I : : I I t I i I i I 
SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 

790 800 810 820 830 840 

850 860 870 880 890 900 

LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 
It illllllllillililIllillllll:illll:ltltllllll)ll)Mlllllltt 
LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 
lllllllltllltlMIIIIMIIIIIhllilllli I I I I I I I I I I I: i I I t I I I I 
GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR RSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

VNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDN 

I i I I t I I I n I I I t i I I I I I I I I I I I I I I I I 1 I I I I I I M I I I I I : i I I I I I I I I I I I I 
VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 
960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 

KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKA 

I I M I I I I I I I I I I i I I t I I I I I I i I I i I I { I M I i I I I I I i I I I t I I I 
TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

EAKKQAEKDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQ 

I I: I I I I [ I I I I M I I M I I I : I : I I : I t I t M I M I M t I : I t I I M I I I I I I I 
QAQLAAKQQAEKDNAQSLDALIAAGRNATEKAESVAEPARQAGGENAGIMQAEEEKKRVQ 
1080 1090 1100 1110 1120 1130 



45 



1130 1140 1150 1160 1170 1180 

orf 1-1 . pep ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 
I I I I I I I i I I I 1) I I I t I I I I I i t I I t M I I I I I I I I I I I t I I I I I t I 1 t I I I I It I i I 
orflng-1 ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
1140 1150 1160 1170 1180 1190 



50 



55 



60 



65 



1190 1200 1210 1220 1230 1240 

orf 1-1 . pep ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
I i I II M I I t I I I I I I I I I I I I I t I II I I I ) M I t I I It I I t I t II I 11 I II I I t II I I I 
orflng-1 ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orf 1-1 . pep SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 
1 I II i I I M I II I I II M M 11 I II I I M I II I I I I I It I I 1 I 1 1 I I I I I I t [ I I I 1 
orflng-1 SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orf 1-1 . pep GGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
I I I I I I I I It I t I t I t t I I I I I I I t I I I II I I I I It I I I It t II I I t I II t 1 I I I I 1 I I 
orflng-1 RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 



70 



1370 1380 1390 1400 1410 1420 

orf 1-1 . pep AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
I I I 1 1 t I t t I I I I t I I I I I I I I I 1 I I I t I I I I 1 t I I I I I I I I I I I I I II I i I I I I I I t I I 
orflng-1 AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1380 1390 1400 1410 1420 1430 
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1430 1440 1450 

orf 1-1 . pep KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
I i I I I I I I I I I I I I I I t I I I i I I I I I I I I I t I 
orf lng-1 KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
5 1440 1450 1460 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 

SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 

10 

10 20 30 40 50 60 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

I :i: |:|:lt: M I I I I M I I : t I I I I I t i I t 
MKKTVFRLNFLTACISLGIVSQAWAGHTYFGIDYQYYRDFAEN 
10 20 30 40 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 
I I I I : I I I : : t : i I I t : t : I I I I ) I I I I I I I I I I I I I I I i I t i : : I I I I I I I I I II: 
KGKETVGAQNIKVYNKQGQLVGTSMTKAPMIDFSWSRNGVAALVENQYIVSVAHNVGYT 
50 60 70 80 90 100 

130 140 150 160 170 180 

NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

: I I I I M I : I It I I M: I : I I i I I I I I I IN It I I I II II II: I t :: I I I t 
DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 
110 120 130 140 150 160 

190 200 210 220 230 240 

DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
:| |:| :|||:fllli:tll:ll:l:l: : ::|:ll :|::||| I |:|: 

NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 

170 180 190 200 210 

250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
I 11:: I : t I M : I I I I I I I I I I t I I I : I I I I I I II : I : I I I : I I i I I 
GYSYLGGDVI^GEYGPLPIAGSKGDSGSPMFIYDAEKQECWLINGXLREGNPFEGKENGF 
220 230 240 250 260 270 

310 320 330 340 350 360 

QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
lllll::| III! I I: :t II I :: |:|l l:t t ::) ::l : 

QLVRKSYF-DEIFERDLHTSLYTRAGNGVYTISGNDNGQGSITQKS GIPSEIK 1 

280 290 300 310 320 

370 380 390 400 410 419 

QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
I 1:11 :: I:: III lillttt:: |:l: :t II : : I : M I I I I I ! I 
TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 
330 340 350 360 370 380 

420 430 440 450 460 470 479 

orf lng-1 . pep FEGNFTVSPBCNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
55 I I I I I I I I t :: I : I I I II I : I : I :: I II I I I I II I : I t I I I t I I I I I I I I I I I I : II : 

p45387 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 

390 400 410 420 430 440 

480 490 500 510 520 530 539 

60 orf lng-1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

I I I I i I I I I : I M I I I I : II I II I I II I I I t I I II I I I : I I : I I : I I I I I i II I I I I I 
p45387 SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 
450 460 470 480 490 500 

65 540 550 560 570 580 590 

orf lng-1 . pep HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 
ll|:|:||||||||||||||| : ::llllll::l: :lll l:ll :|||||||lll 
p45387 HSLTFKRIQNTDEGAMIVNHKTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 
510 520 530 540 550 560 



15 



orf lng-1. pep 
p45387 



orf lng-1. pep 
20 p45387 



orf lng-1. pep 

25 

p45387 



30 orf lng-1. pep 

p45387 

35 

orf lng-1. pep 
p45387 

40 

orf lng-1. pep 
p45387 

45 

orf lng-1 .pep 
50 p45387 
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15 



20 



600 610 620 630 640 650 

orf lng-1 . pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

I I :[ nun in n n n n n n n n n i n n n n n n n n i n : 

p45387 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 
570 580 590 600 610 620 

660 670 680 690 700 710 

orf lng-1 . pep WSKMEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGV 

nninnnnnnnnnnntnnnnnnninin ntnnnn 

p45387 WSEMEGIPQGEIVWDHDWINRTFKAENFQIKGGSAWSFNVSSIEGNWTVSNNANATFGV 
630 640 650 660 670 680 

720 730 740 750 760 770 

orf lng-1 . pep APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 

nn:nnninnnn : ni ni n nn n:nnn n ni n 

p45387 VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 
690 700 710 720 730 740 

780 790 800 810 820 830 

orf lng-1. pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 
\\:: | 

p45387 GNVTL TNHSQFTLSNNATQIG 

750 760 770 



25 



30 



35 



840 850 860 870 880 890 

orf lng-1 . pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

:: inn ni::: inii inn I :nin: nn I n: i::: iini 

p45387 NIRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 

780 790 800 810 820 830 

900 910 920 930 940 950 

orf lng-1 . pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 

I ) n n n n n n II I : n : n n ii i : i inn ii n ii 

p45387 TTLQNLTLNNSTITLNSAY SASSNNTPEIRRS LETETTPTSAEHRFNTLT 

840 850 860 870 



40 



960 970 980 990 1000 1010 

orf lng-1 . pep VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 

II II I n 11 It n I linn n n : : :n i i in i n n i n 1 1 nn i n ii 

p45387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 
880 890 900 910 920 930 



45 



1020 1030 1040 1050 1060 1070 

orf lng-1 . pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 

iinnnnnnniii 1 1 n : : n n n ii n n ii ii : i n :n n ii 

p45387 QPLSDKLKFTLENDHVDAGALRYKLVKNDGEFRLHNPIKEQELHNDLVRAEQAERTLEAK 
940 950 960 970 980 990 



50 



55 



60 



65 



1080 1090 1100 1110 1120 1130 

orf lng-1 . pep QAQLAAKQQAEKDNAQSLDALIAAGRNAT-EKAESVAEPARQAGGENAGIMQAEEEKKRV 
I:: ni I: : : : : \ I 11 :: ::: I ini :\ :::: : hi 
p45387 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLLNALEAKQAE-LTAETQKSKAKTKKV 
1000 1010 1020 1030 1040 1050 

1140 1150 1160 1170 1180 1190 

orf lng-1 . pep QADK DTALAKQREAETRPATTAFPRARRARRD-LPQPQPQPQPQPQRDLISRYANSG 

: : : : I I : I : : : : : : n I I : : I : in n 11 I n I : 

p45387 RSKRAVFSDPLLDQSLFALEAALEVIDAPQQSEKDRLAQEEAEKQ-RKQKDLISRYSNSA 
1060 1070 1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

orf lng-1 . pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 
inn lini::nini nn::: ::ini: :| ::| I: llinn 1:11111 
p45387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
1120 1130 1140 1150 1160 1170 



70 



1260 1270 1280 1290 1300 1310 

orf lng-1 . pep MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 
:|| |::||:| :|||:|: nil: : i I I: : IHI I : : : I : : : I : I : I : : 
p45387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 
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1320 1330 1340 1350 1360 1370 

orf lng-1 . pep SLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGL 
: ; : : | I : | : : : : I I : : I I : : I : I I : I : : I : : I I I : : : : I : I : I : 1 I : I 
p45387 KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 
1240 1250 1260 1270 1280 1290 

1380 1390 1400 1410 1420 1430 

orf lng-1 . pep AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 

MM) Ml::ll:l ll: :: I :!::::: I : I M Ml I M: : I 

p45387 AFNRYNAGIRVDYTFTPTDNISVKPYFEVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEV 
1300 1310 1320 1330 1340 1350 

1440 1450 1460 1469 

orf lng-1 . pep GVNAEIKGFTLSLHAA7VAKGPQLEAQHSAGIKLGYRWX 

|:MM I M : :M M |::MMMMI 
p45387 GLKAEILHFQISAFISKSQGSQLGKQQNVGVKLGYRW 
1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins fix^m N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 78 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 655>: 

1 . .AAGGTGTGGC MTTTGTCGA AGA.CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

401 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 ..KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 
101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . .CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; 0RF6-1>: 

1 ..LRAWPADSF EPTAQKLNLF KAGMTILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANZ^M VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N.meninsitidis (strain A) 

0RF6 shows 98.6% identity over a 140aa overlap with an ORF (0RF6a) from strain A of M 
meningitidis: 

10 20 30 
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orf 6 . pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 

I I ) I I I I I I I I I I I I I M I I I I i I I I I I 
orf 6a QIVEHAVLHTPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFK 
40 50 60 70 80 90 

5 

40 50 60 70 80 90 

or f 6 . pep AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
I I I I I I I I I I I I t I I i M I I I I I I I I i ) I I M I t I I I I I I I I I I I t I I I I t t I I M t I I t 
orf 6a AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
10 100 110 120 130 140 150 

100 110 120 130 140 

orf 6 . pep NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
I I I M I I I I I I I I I I I I I I I t I I I [ I I t I I I t I M I I I I i I I i I t I I I I t I 
15 orf 6a NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

160 170 180 190 200 

The complete length 0RF6a nucleotide sequence <SEQ ID 659> is: 

1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

20 101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

25 351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

30 601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RWVLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

35 151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 



0RF6a and 0RF6-1 show 100.0% identity in 131 aa overlap: 



40 



45 



50 
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50 60 70 80 90 100 

orf 6a. pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I M I I M i M t I M I I I I 1 t t i I t I I I I 
orf 6-1 LRAWPADSFEPTAQKLNLFKAGAATILFY 

10 20 30 

110 120 130 140 150 160 

orf 6a . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQH YNPLPDAAIA 
I I I I I I It t i I I I i I t I I I I I I M M I I I I I I I I I I I i I I I I [ I I I I M t I t I I I t I I I I 
or f 6- 1 EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
40 50 60 70 80 90 

170 180 190 200 

orf 6a . pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
I I I I I I I I I I I i I I I M I I I I I t t I It I I I I I I i I i I I 1 I I t 
orf 6-1 KAWNIPENWLLRAQMVIGGXEGAAGEKTFEPVAERLKVFGAX 

100 110 120 130 



Homology with a predicted ORF from Kmnorrhoeae 

0RF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 



N, gonorrhoeae: 
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KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 
IIIMM l)lllll[lll)ltllt:|ll 
SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 64 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 90 

IIKMIIMIMillMlllillllitlllllllllllllltlll :|lilMI 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHY 124 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGA 140 
|||||:||illltllllllttllltliIllliillt:illllllllMit 
NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 



The complete length 0RF6ng nucleotide sequence <SEQ ID 661> was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

401 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

501 acgtttgAAA GTGTTCGGCG CATAA 

This encodes a protein having amino acid sequence <SEQ ID 662>; 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAWPA 

51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 

101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 

151 GGIEGAAGEK VFEPVAERLK VFGA* 



orf 6. pep 
orf 6ng 
orf 6. pep 
orf 6ng 
orf 6. pep 
orf 6ng 



0RF6ng and 0RF6-1 show 96.9% identity in 131 aa overlap: 

10 20 30 

LRAW PADS FEPTAQKLNLFKAGAAT I LFY 
I I I I I I I I I I I I I t I I I : 1 i I I I t I I I t i I 
PTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 
20 30 40 50 60 70 



orf 6-1. pep 
orf 6ng 



40 50 60 70 80 90 

orf 6-1 . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
I I I I I I I I I i I I I I I I t I i I t I t I I I I t 1 I I I I [ I I I I I I I I [: t I t I i I I I I I I I: I I I 
or f 6ng eDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 

100 110 120 130 

orf 6-1 . pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGTOC 
t I I i I I I I I I I I I I M i I I I I I I I I I t: I t I i I I i I t I I I I I 
orf6ng KAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGAX 
140 150 160 170 

It is predicted that the proteins from N, meningitidis and Kgonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 79 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 663> 

1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA actaccaaat 

51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGG^ caatgccaac 

101 accgccgcct atgagcgcgt agaagtcgtg cgcggcgtgg cggggctgct 

151 ggacggcacg ggcgagcctt ccgccaccgt caatctggtg cgcaaacgcc 

201 tgacccgcaa gccattgttt gaagtccgcg ccgaagcggg caaccgcaaa 
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251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 
301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 
351 GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 
401 ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 
451 AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 
501 ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 
551 AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCM 
601 CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC. . 

This corresponds to the amino acid sequence <SEQ ID 664; ORF23>; 

1 ..GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 
51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 
101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 
151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 
201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 EX5YTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 



wo 99/24578 



-380- 



PCT/IB98/0I665 



10 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYTRYR 
SLFVPQSQKD 
LATAAGRDPS 
DQDGSRLNPD 
TLRIPNPAAK 
YRTQPDRHSY 



KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAADNSRQK 
GALRTVNAAF 



AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AYAVADIMAR 
TYRFK* 



AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKTR 
WQSETHTDPA 
LNVDNLFNKH 



15 



20 



25 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (accession number P38047) 
ORF23 and PupB protein show 32% aa identity in 205aa overlap: 

FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 
++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 
WSRGFAIQNYEVDGVPTSTRL-DNYSQSMAMFDRVEIVRGATGLISGMGNPSATINLIRK 273 

RLTRKPLFEVRAEAGNRPCHFGLDADVSGSLNTEXXLRGRLVSTFXXXXXXXXXXXXXXAE 125 
R T + + EAGN +G DVSG L +RGR V+ + 



Orf23 


6 


PupB 


215 


Orf23 


66 


PupB 


274 


Orf23 


126 


PupB 


334 


Orf23 


184 


PupB 


392 



+YGI E+D++ T + 



D+PL 



S G 



N A +W+ 



SHHRALNLFAGIEHRFNQDWKLKAE 208 
+ H +FIE+ WKE 



30 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A of N. 



35 



40 



45 



50 



55 



meningitidis: 



orf 23.pep 
orf23a 

orf 23. pep 
orf23a 



orf 23. pep 
orf23a 

orf 23. pep 
orf23a 



10 20 30 

GYNYLFARGSRIANYQINGI PVADALADTG 
t I I i I I I I I I I I I I I I I I t I I I I I I I I I I I 
QMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIPVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 

I M I M I I I I M I I I t I I I i I I I I I I I I I I I I I i I I I I I I I I I I I M I I I M I I t I M 
NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 
150 160 170 180 190 200 

100 110 120 130 140 150 

VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 

Itllll:! : I I I 11 I I I I I i i I I I I : I I I I I I I I 1 M I I I I I I I i i 1 I I I Mlllll 
VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 
210 220 230 240 250 260 

160 170 180 190 200 210 

ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 
1) I t I I I t I I M I I I t M I t I t I I I I I I I I I t I I : I I I t I I I I I I I I I f t 1 I I I I I I 1 i I 
ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 
270 280 290 300 310 320 



orf 23. pep Y 

60 orf 23a YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 

330 340 350 360 370 380 

The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 
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1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CAC7VACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATGGGT TTAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 668>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 

10 20 30 40 50 60 

orf 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSN DGYTVSGTHT 

t I I I I I I I I I I i I I I I i I I I I t I i i 1 I i t I t M t I I I I 1 I t M I I I I t I I I 1 M 1 t I I I I 
orf 23-1 MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 
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Homologv with a predicted ORF from K^onorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from K 



GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLD 
I I I I I i I I I I I I I I I I i I I i I I M I I t I I I 1 I I I I I I i I I I I I I I I I I I I 
SAVDACRIPGYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPD 



51 



60 



111 



GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 
I I I I ] I I I I I I M I : I I I I I I I I I I M I I I i t I M IIIIMII:) Mllllltltll 
GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 



gonorrhoeae: 

orf23 .pep 
orf23ng 
orf23.pep 
orf23ng 
orf23.pep 
orf23ng 

or f 23 . pep GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 
I i I I I I I I I I : I I :: I I M I I i I I I I I I I 1 I I I I I I M t t 
orf23ng GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 

The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 670>: 



GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 

Hill: nil 1 1 II t It 1 1 1 1 1 1 1 1 1 M I 1 1 1 II II 1 1 1 II 1 1 II II M I II I II 1 1 

GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 



171 



180 



211 
240 



1 SAVDACRIPG YNYLFARGSR lANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDNPATNW SNSRNRALNL 

201 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

401 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLN7VSAA VYRARKNNLA 

451 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLN VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence <SEQ ID 671>: 



1 ATGACACGCT TCAAATACTC 

51 CGCGCAGGCC GATGTTTCTG 

101 CCGAATTGCC GACCATCACC 

151 GACGGCTACA CCGTTTCCGG 

201 CCTGCGCGAA ATCCCGCAGA 

251 GCGACCAAAA CATCAAAACG 

301 ACCAGCCGCC AGATTTACGG 

351 CGCGCGCGGC AGCCGCATCG 

401 CCGACGCGCT GGCCGATACG 

451 GTAGAAGTCG TGCGCGGCGT 

501 TTCTGCCACC GTCAATCTGG 

551 TTGAAGTCCG CGCCGAAGCC 

601 GACGTATCGG GCAGCCTGAA 

651 TTCCACCTTC GGACGCGGCG 

701 ATGCCGAACT CTACGGCATT 

751 GTCCACGCAG GCATGGACTA 

801 GCTCAGCTAC GCCGTGTACG 

851 CAAAAGACAA CCCCGCCACA 

901 AACCTGTTCG CCGGCATAGA 

951 AGCCGAATAC GACTACACCC 

1001 CAGGCGTACT TTCCATCGAC 

1051 GGTTATTGGC ACGCcgatcc 

1101 CGGCAAATAC CgcctGTTCG 

1151 ACGGCTACAA ATACGCCAGC 

1201 AACGCCATTC CCAACGCCTA 

1251 GCCATCATCG TTTGCCCAAA 

1301 TCGGCGGCTA TCTCGCCACC 

1351 ATACTCGGCG GCAGATACAG 



CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 
TTTCAGACGA CCCCAAACCG CAGGAAAGCA 
GTTACCGCCG ACCGCACCGC GAGTTCCAAC 
CACGCACACC CCGTTCGGGC TGCCCATGAC 
GCGTCAGCGT CATCACATCG CAACAAATGC 
CTCGACCGCG CCCTGTTGCA GGCGACCGGC 
CTCCGACCGC GCGGGCTACA ACTACCTGTT 
CCAACTACCA AATCAACGGC ATCCCCGTTG 
GGCAATGCCA ACACCGCCGC CTATGAGCGC 
GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 
TACGCAAACA CCCGACCCGC AAGCCATTGT 
GGCAACCGCA AACATTTCGG GCTGGGCGCG 
CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 
ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 
TTGGAATACG ACATCGCACC GCAAACCCGC 
CCAGCAGGCG AAAGAAACCG CAGACGCGCC 
ACAGCCAAGG TTATGCCACC GCCTTCGGCC 
AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 
ACACCGCTTC AACCAAGACT GGAAACTCAA 
GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 
CACAGCACTG CCGCCACCGA CCTGATTCCC 
GCGCACCCAC AGCGCCAGCA TGTCATTGAC 
GCCGCGAGCA CGATTTAATC GCGGGTATCA 
AACAAATACG GCGAACGCAG CATCATTCCC 
CGAATTTTCC CGCACGGGCG CCTATCCGCA 
CCATCCCGCA ATACGACACC AGGCGGCAAA 
CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 
CCGCTACCGC GCAGGCAGCT ACAACAGCCG 
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1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGGCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATcggTGC 
GCGCTCCGCA 
CCGCCAGAAA 
ATCCGCGCAC 
TACCGCACCC 
CGCGGCGTTT 



ATGACCTATG 
CGATCTGACC 
TCCCGCAATT 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACCGCCTACC 
GGGTGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGA 
GCCGCCGTGT 
CGACCAGAGC 
ACGGCTGGGA 
CAGGCAGGCT 
GAACCCCGAC 
ACTTAGCCCC 
CGGCAGGGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTAcCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAACCTGTT 
GGCGCACTGC 



CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTATCGCGC 
GGCGGCCGCA 
CAAACCCCGC 
AACGCAGCTT 
AGCGGCCGGA 
CGACCCAGCC 
TCGCCAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PFGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL PDGTGEPSAT VNLVRKHPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSI^NAEG TLRGRLVSTF GRGDSWRQLE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWSNSRNRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HSTAATDLIP 

351 GYWHADPRTH SASMSLTGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPSS FAQTIPQYDT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR AGSYNSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQLQKD EHGSYLKPVT GNNLEADIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDQS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKPR 

601 DQDGSRLNPD SVPERSFKLF TAYHLAPEAP SGRTIGAGVR RQGETHTDPA 

651 ALRIPNPAAK ARAVANSRQK AYAVADIMAR YRFNPRTELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23ng-l and ORF23-1 show 95.9% identity m 725 aa overlap: 



10 20 30 40 50 60 

orf 23-1 . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSN DGYTVSGTHT 
I I t I I I I I I i I I I [ I I I I 1 i i [ I I I I I I I 1 I I I I I I I I I I i I t I i i I I I t I I I i I I I I t I 
orf23ng-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSN DGYTVSGTHT 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 23-1 . pep PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
I: I i I I t I I t I I t I I t I i I I t I M I I i I I I I I t I I I I I I I I I t i t I 1 I I I I I i I I I I I I I 
orf23ng-l PFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 23-1 . pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 
llllllllMltllltlllllltllllllllllllMtIi MMItlllllllll: 11 
orf23ng-l SRIANYQING I PVADALADTGNANTAAYERVEWRGVAGLPDGTGE PS ATVNLVRKHPTR 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 23-1 . pep KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 
llllllllltMllltll ll)llllt:llllliMitilMIIIII: tllllllttll 
orf23ng-l KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 23-1 . pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
llllllllillliillllllllllllilllllllliMiliMillilMII:M):tll 
orf23ng-l LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 23-1 . pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
[ M I i I I I I M I I I t I I I I M I I I t I t I I I I I I I I I I I I M : I I I I I I I I I I I I I I ) i 1 I 
orf23ng-l NLFAG I EHRFNQDWKLKAEY DYTRSRFRQP YGVAGVLS I DHSTAATDLIPGYWHADPRTH 
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310 



320 



330 



340 



350 



360 



10 



15 



20 



25 



30 



35 



370 380 390 400 410 420 

orf 23-1 . pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
111:11 I t I I I I I I I I I I I I I I I I t I I I I I I i M I I I I M t I I I I I I I i I i I I I I I I : I 
orf23ng-l SASMSLTGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPSS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 23-1. pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
I M I i I I I M I I I I I I I I I I I I I t t I I I I I I 1 I I t : I I t : I I t : I I I I t I i I i M I I M 
orf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 23-1 . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I t I I I t I I I I I I t t i I I I I t I I I I I I I t I I I I t I I I I t I I I i I I I I I I I I t i I I I I 
orf23ng-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 23-1 . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I [ I I I I I I I i I I I I I I I ItllflllillllMllllllttitlMIMIilllllt I 
orf23ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEXEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 

610 620 630 640 650 660 

or f 23-1 . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

I I I I I I t I i I t I I 111111:1111111 llltlll t:lllllll:IIIIIIIM 

orf23ng-l DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

or f 23-1 . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
Ml: lllllltllllll)illlllt:IIIIIIIIIIIIIltlllMltlllllilltll 
orf23ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



40 orf 23-1. pep TYRFKX 

I I I I I I 

orf23ng-l TYRFKX 

In addition, ORF23ng-l shows significant homology with an OMP from E.coli: 



45 



50 



55 



60 



65 



70 



spl P16869I FHUE_ECOLI OUTER- MEMBRANE RECEPTOR FOR FE ( III ) -COPROGEN, FE(III)- 
FERRIOXAMINE B and FE(III)-RHODOTRULIC acid PRECURSOR >gi|1651542|gnl|PID|cil015403 
(D90745) Outer membrane protein FhuE precursor [Escherichia coli] 
>gi|1651545 Ignll PID|dl0154 05 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi|178734 4 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe (III) -rhodotrulic acid precursor 
[Escherichia coli) Length = 729 
Score = 332 bits (843), Expect = 3e-90 

Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps =60/717 (8%) 

TITVTADRTASSN — DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 
T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

LQATGTSRQI YGSDRAG YN YLFARGSRIAN YQINGI P VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY ++GIP + DAL+D A 
ENTLGISKSQADSDRALY YSRGFQIDNYMVDGIPTYFESRWNLGDALSDM AL 154 



+ERVEVVRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

FERVEWRGATGLMTGTGNPSAAINMVRKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 



Query: 


38 


Sbjct : 


43 


Query: 


96 


Sbjct: 


103 


Query: 


148 


Sbjct: 


155 


Query: 


207 


Sbjct: 


215 


Query: 


267 



+G +R R+V 



DSW 



GI++ D+ T + AG +YQ+ 



PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 326 
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10 



15 



20 



25 



30 



+++ G + ++ + A +W+ + +F ++ +F W+ ++ 

Sbjct: 275 WGGLPRWNTDGSSNSYDRARSTAPDWAYNDKEINKVFMTLKQQFADTWQATLNATHSEVE 334 

Query: 327 F~RQPYGVAGVLSIDHSTAA~TDLIPGY WHADPRTHSA-SMSLTGKYRLFG 374 

F + Y A V D ++ PG+ W++ R A + G Y LFG 

Sbjct: 335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 394 

Query: 375 REHDLIAGINGYKYASNKYGER— SIIPNAIPNAYEFSRTGAYPQPSSFAQTIPQYDTRR 432 

R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 

Sbjct: 395 RQHNLMFG-GSYSKQNNRYFSSWANIFPDEIGSFYNFN— GNFPQTDWSPQSLAQDDTTH 451 

Query: 433 QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 4 91 

Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

Sbjct: 452 MKSLYAATRVTLADPLHLILGARYTNWRVDT LTYSMEKNHTTPYAGLVFDIND 504 

Query: 4 92 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRXEQDNV 564 

Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 

Sbjct: 565 AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 

P ++P + K+FT+Y L P P T+G GV Q +TD P RA 
Sbjct: 625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGA^WQNRVYTDTV TPYGTFRA E 672 

Query: 669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 

Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

Sbjct: 673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 729 



Based on this analysis, it was predicted that these proteins from N.meningitidis and Kgonorrhoeaey 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fusion protein, and Figure 15B shows the 
results of expression of the GST-fusion in E,colL Purified His-fiision protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a usefiil unmunogen. 



40 Example 80 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG. . 

50 This corresponds to the anwno acid sequence <SEQ ID 674; ORF24>: 



1 

51 
101 



MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
PCVPQTLKPI XSRMRATXSP TG. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

' 51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

351 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 

551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 

601 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 

651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 

701 CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC AATATCGGTA GTCTTCATCG CTTCGGGAAT 

801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 

851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 



1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIISKPTE QTAVMASSLS 

51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PA RVLP 

251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 

301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF fix)m Kmeninsitidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 24a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I I I i [ I I I I I I I I I I I t I t I I I I I I I I I I I I I i I I IIIIIII:IMII:|llllltil 
orf 24 MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 2 4 a . pep 1 1 PSSSXTG INAPLKPPTALEAIMPP FFTAS FSNAKAAW PCVPQTLKPI S SRMRATES P 
tlllll Illlllllllilllitllllllllllllllllltlllllllllllllllltll 
orf 24 IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 24a . pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
I I I I ! I I I I t M I I I I t I I i I I i I I i I I i I I I I I i I I t t t I t M I I I I I I I I I i I I i 1 I I 
orf 24 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 24a . pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
llltlltlllMlllltlillll ltll:llMIIIIIIIIII:itl til lltlllll 
orf 24 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 24a . pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
I I t I t I i I I I I i I I I 1 I I I I t I I I I I I I t I i I I I I I I I I i I : I I I 1 I I I I I I I I I I I I I 
orf 24 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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orf24a.pep KVCATLTX 
i I M i I I I 
orf24 KVCATLTX 

The complete length ORF24a nucleotide sequence <SEQ ED 677> is: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 

151 AACGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 NACGGGGATA 7UVCGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 678>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf 24a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I I I t I I [ I t I 1 I I I I 1 I I 1 i I I I I t I I i I I I I M I I I I I I I I : I I I I I : I I I I I M I I 
orf 24-1 MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 24a. pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPrSSRMRATESP 
t I I t M I I I I I I t I M i i I i i I I I I I I t I i I I I I I I I I I I I t I I I I t I t I M M I I t I I 
orf 24-1 IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 24a. pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVrLKAVFFTTSATSVNWASEFSNAAFTT 
i I I I I I I I M I I I t I I I I I I I I I I I I I I I I I t t I I I I I I M I I I i I I I I I I I I I I I I I It 
orf 24-1 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24a . pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
I I I M I I I I I M I t I I I I I 1 I I I I t I I : I I I I I t i I 1 M t I I : I I I III I t I I I I M 
orf 24-1 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24a. pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
IIIIIMIIIMIIIIIIIIIIIIIIIM ltllll)lllli:llillilll|IMilM 
orf 24-1 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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orf 24a . pep KVCATLTX 
I I M i I M 

orf 2 4-1 KVCATLTX 



Homology with a predicted ORF from N, gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
T^, gonorrhoeae'. 



orf 24 .pep 
orf 24ng 
orf 24 -pep 
orf 24ng 



MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 
I I I I I I I I [ I I i I i I i I I I [ I I I I I I i I I I I i I I : I I t I I I I I I I I i I I t I I : I I I I I I I 
MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 



60 



60 



120 



IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 
I I I i I i I I I i I I I I I i t I I I I i I I ) I I I I i I I I I I I I I I I I I 1 I I t I I I I Mini II 
IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 120 



orf 24. pep TG 122 
i : 

orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 

The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGCGCACGG 
GGCGATGATG 
TCATGTCCAA 
AGCGTCAACA 
AACGGGGATA 
TGCCGCCCTT 
CCGTGCGTAC 
CGAGTCGCCG 
ACGGGATATT 
CGGGTGATTT 
GCTGACCGCG 
ATACGCCGAC 
CCCGCCATAA 
GGCGCAGCCG 
CGCCTGCCAG 
ATATTGATGG 
- GGAACGGATC 
CGGAAAAGCC 
AAAGTCTGCG 



CGGTGGTTTT 
CCGGAAATGG 
ACCAACGGAG 
CGCCTGCCTC 
AACGCGCCGC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTTTT 
TGAAAGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGATTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAG 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAGACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACG 
CCAGCGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GTGTGATTTC 
TCCATATTGA 
GATATCGGTA 
CCGAAGGCGA 
GACACGCCGA 
ATAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAATCG 
GACCGATGAG 
ACTTCGGCGA 
TTTGACCACG 
CGCCCGAGCC 
TTGCAGAACA 
AGCCGTGCGT 
TACCGGCACG 
GTTTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATGCCGA 
TTCGCCCACG 
CCTCGGTCAG 
CCTGGACCGG 
GTGGAACGCA 
CGACGATTTT 
TTGATGGTTT 
CGTGCTGCCG 
CTTCGGGAAC 
TGCACCAGCG 
TGCCTTGTCC 



This encodes a protein having amino acid sequence <SEQ ID 680>: 



1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PA RVLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 



10 20 30 40 50 60 

or f 2 4 - 1 . pep MRTAWLLLIMPMAAS SAMMPEMVCAGVS PGTAI I SKPTEQTAVMAS SLSS VST PAS AAA 
I II I I I I I I It I II I I I I I II I I I I I I I I 1 I t I I : t II I I I I I I I I ) II I I I : t I II t II 
orf24ng MRTAWLLLIMP^4AASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSV^ITPASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 24-1 . pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
II I I I I II I I I I t I II I I II I I t I I I 1 II I II M II M I I I I I I I I t I I I I I I i I I I I i I 
orf24ng HPSSSETGINAPLKPPTALEAIMPPFFT AS FSNAKAAW PCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 24-1. pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

I I I [ I I I t I I t : I I 1 I I I I M I i I I I I I M I I I I I I I I I I i M I I I : : I i I I t : If : I I 
orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24-1 . pep PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

I I i 1 1 1 1 1 [ 1 1 1 1 1 1 1 1 1 1 1 i I i i 1 1 M 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 

orf24ng PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1 . pep SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
I I I [ I I I I I t I I I I I I I I I I t I i I M I if I I I t I I I t M n : I M i I I I I I t I I t I I I [ 
orf24ng SILIPARVLPILMELHTISWFXASGTERINTSSEGDIPFCTSAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

orf 24-1. pep KVCATLTX 
I I I t i M I 

orf24ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 1 8 aa - double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from Kmeningitidis and Kgonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 681>: 

1 . . ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

1 ..TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 
51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 T^CTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 



1 

51 
101 
151 
201 
251 
301 



MYRKLIALPF ALLLAACGRE 



SFAREDGRQF 
SETLADAKAN 
GQTAFVDNTV 
REEEPSKPTP 
VSRGEVEEAR 
RQAAAQADRQ 



VDADKIIAAA 
SPLLYGETAL 
CaiAAQTLSAA 
EDILEHNAAG 
VQNQRAESEI 
EYAEYLKLQC 



EPPKALECAN 
YGLAFSLEHA 
SDIVRQKTGG 
LLPYGVKSIV 
GDAGVPQAAE 
TKLWGGLDTD 
DTRMTRERIQ 



PAVLQGIRGN 
SETQEGGRTF 
NVEFKDGVLT 
MIDGKAVKKE 
GAPEPEILHP 
VQKELVGEQR 
YLRGYSID* 



IQETLTQEAR 
CIADLNITVP 
AAVRFLPVKD 
DAVRILSGKA 
DDGERADTVT 
KWAQEKISNC 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF25 shows 983% identity over a 60aa overlap with an ORF (ORF25a) from strain A ofN. 
meningitidis: 

10 20 30 

orf25 pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 

I I I 1 1 t I 1 1 I t i I I I I I I I I M I 1 I i 1 1 I 

orf25a VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNCRQAAAQAD 
250 260 270 280 290 300 



40 50 60 

or f 25 . pep RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
[ I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
orf25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

451 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 



1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 



10 20 30 40 50 60 

orf 25a . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 
llllllttniMUIIItllMlllllltlllll li I M I i I It t M I I I I I I 11 
orf 25-1 MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
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10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf 25a . pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 
I I I I i I I t I I M I I I I I t M I I I I I I I I t I I i I I I 1 I I I I I i i I I M I I I I I I 

orf 25-1 VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSTUaLPYGVKSIV 
t I M I I I I I I I I I I I I I I I I I I I I I I I I i I M : I I I I [ M I I I I I I M I I I i I I I I I I I I 
orf 25-1 SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25a . pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 
I M I I M I I I I I I t t I Mi I I I I : I I I I [ i I I I I M I I t I I I : I I I I I I I I I I 
orf 25-1 MIEX3KAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
M I I I I I I I I I 1 M I i I I I I I I I I I I I I i I I I 1 I I I I I I I I i I I I I I t I I I I I I I I I M 
orf 25 - 1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

or f 25a . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
M I I t I I I I I I i I I I t I M M I I i I I I I I It I I I I [ I I I 
orf 25-1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 

Homology with a predicted ORF from Kgonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N.gonorrhoeae: 



orf 25. pep 
orf25ng 
orf 25. pep 
orf 25ng 



TDVQKELVGEQRKWAQEKISNCRQAAAQAD 30 
I t I I I I I I I I I i i 1 I i I I It I I I I I I I I I I 
VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNCRQAAAQAD 308 

RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 
I I I I I It I I t I I II I I it I t I i I I t I I It I 
RQEYAEYLKLQCDTRMTRERIQYLRGYSID 338 



The complete length ORF25ng nucleotide sequence <SEQ ID 687> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGTATCGGA 
CGGCAGGGAA 
TGCAGGACAT 
TCTTTCGCGC 
CGCCGCCGCC 
AGGAAGGCGG 
TCTGAAACGC 
AACGTCTTTG 
TTAAAGACGG 
GCTCGGACGG 
GTCTGCCGCG 
GCAAGGCGGT 
CGTGAAGAAG 
TGCCGCCGGC 
AACCCGAAAT 
GTATCACGGG 
ATCCGAAATT 
AGTTGGTCGG 
cgACAAGCCG 
GCTCCAATGC 
GCTATTCCAT 



AACTCATTGC 
GAACCGCCCA 
ACGCGGCAGT 
GCGAAGACGG 
TACGGTTTGG 
GCGCACGTTC 
TTGCCGATGC 
GCAGACATCG 
CGTATTGACG 
CATTTATCGA 
TTGCTGCCTT 
GACAAAAGAA 
AACCGTCCAA 
GGCGATGCGG 
CCTGCATCCC 
GCGAAGTGGA 
ACCAAACTTT 
CGAACAGCGC 
CCGCGCAGGC 
GACACGCGGA 
CGATTAG 



GCTGCCGTTT 
AGGCGTTGGA 
ATTCAGGAAA 
CAGGCAGTTT 
CGTTTTCTTT 
TGTATCGCCG 
CGAGGCAAAC 
TGCAGCAGAA 
GCAGCCGTCC 
CAACACGGTC 
ACGGCGTGAA 
GACGCGGTCA 
ACCCACCCCC 
GCGTACCCCA 
GACGACGTCG 
AGAGGCGCGC 
GGGGAGGACT 
AAGTGGGCGC 
AGACCGGCAG 
TGACGCGCGA 



GCCCTGCTGC 
ATGCGCCAAC 
CGCTCACGCA 
GTCGATGCCG 
GGAACACGCT 
ATTTGAACAT 
AGCCCCCTGC 
GACGGGCGGC 
GCTTCCTGCC 
GGTATGGCGA 
GAGCATCGTG 
GGGTTTTGAG 
GAAGACATTT 
AGCCGCAGAA 
AGCGTGCCGA 
GTACAAAACC 
CGATACCGAC 
AGGAAAAAAT 
GAATACGCCG 
ACggaTACAG 



TTGCAGCGTG 
CCCGCCGTGT 
GGAAGCGCGT 
ACAAAATTAT 
TCGGAAACGC 
TACCGTGCCG 
TGTATGGGGA 
AATGTCGAGT 
CGCCAAAGAC 
CGCAAACGCT 
ATGATAGACG 
CGGCAAAGCC 
TGGAACACAA 
GGCGCACCCG 
TACCGTTACC 
AACGTGCGGA 
GTGCAAAAAG 
CAGcaactgc 
AATACCTCAA 
TATCTTCGCG 



This encodes a protem having amino acid sequence <SEQ ID 688>: 
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1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

5 201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 

10 20 30 40 50 60 

10 orf 25-1 . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 

I I I I I I I I I I I I I I I I I M [ I I i I I I I t t I I I i I I I I I : I I I I i t I I I I I I I I I I ) I I I 
orf25ng MYRKLIALPFALLLAACGREEPPKALECAN PAVLQDIRGS I QETLTQEARS FARE DGRQF 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 25-1. pep VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
I I I I I I I I I I I I I I I I M I I I I I I I I I t I I I t I I I I I I I I I t I I I I I : t I I I I I I i I t : i 
orf25ng VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLN I TVPSETLADAE AN SPLLYGETSL 

70 80 90 100 110 120 

20 

130 140 150 160 170 180 

orf 25-1 . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
: I I I : I It 1 I I I I I I I I t I I t I I I I I I : I I :: I It : I I I I I i I : I I I I M I i I I I I M I I 
orf25ng ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 
25 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25-1 . pep MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
I II 1 I II I I I I I t : I I t I I It I I I I II I I I I II II 11 I I I I I I I II I i It II t I i I I I t 
30 orf25ng MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25-1 . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 
35 It I t I I I I I II I I I I I I I I t I I II I I I I I I I I II II t I I I II I I I I I I I t M I I M I M 

orf25ng DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKI SNC 

250 260 270 280 290 300 

310 320 330 339 

40 orf 25-1 . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

I I I I II I I I I I II M II ! I I I I I I II I I I t I I I I t I II I 
o r f 2 5ng RQAAAQADRQE YAE YLKLQCDTRMTRERIQYLRG YS I DX 

310 320 330 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
Kmeningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in ExolU as described 
50 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fiision protein, and Figure 16B shows the 
results of expression of the His-fiision in Kcolu Purified His-fiision protein was used to immimise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
55 that it is a usefiil immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1. 



Example 82 



The following partial DNA sequence was identified in N. meningitidis <SEQ ED 689> 



1 

51 
101 
151 
201 
251 

851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGsyGATTGG 
TGGGTATTTT 



TCGACTATTC 
GCACTTGCCG 
TCTGGwysGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CTGACCTACT 
// 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 



TTCGGCGGCA 
GATTAAAACC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCA 
TTTCCGACAC 
GACCACGTTA 
CGCATCGGGC 
TTGGCACGAC 
AAAAAA. . 



CTTGCGGCGT 
GCCGACTATC 
AATCGCCATT 
TGCACACCGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 
CCTCGCAACT 
TACCTCGCAT 
AGGCATTGTA 



CTTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGATTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 
GCCTTACGCC 
TGGGTCTGAC 
TTGGCGGTGC 



AC 

GTTCTCTGCA 
TTGGCAGGGT 
CTTGGCTCAT 
TCCACACTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 
TTAACCGTTG 
AAAATCCGCG 
TGATTTTTCT 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CkGATACTTT 
T 

TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCTA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGCT 
GTTGAAAGAT 



This corresponds to the anwno acid sequence <SEQ ID 690; ORF26>: 



1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 
51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN. . . 

// 

251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT TVDYPKAVWQG AKSMFGAIAI LILAWLISTV 
351 VGEb4HTGDYL STLVAGNTHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 
401 lAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 
451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 
501 KK.. 

Further work revealed the complete nucleotide sequence <SEQ ID 691>: 



1 ATGCAGCTGA TCGACTATTC 

51 TTTGGCACTG GCACTTGCCG 

101 GCATCGGTAT TCTGGTCGGC 

151 GACGGTCTGA CACACCTGAA 

201 CGGCGATTGG TCGCTGGGCA 

251 TGGGTATTTT TACTTCCCTG 

301 GCCGACTGGG CAAAACGGCA 

351 GACCGCCTGC CTCGTGTTCG 

401 TCGCCGTCGG TGCGATTGCC 

451 CGCACCAAAC TCGCCTACAT 

501 GCTGATGCCC GTTTCAAGCT 

551 GACTGCTCGT TACCTACAAA 

601 GTCGCCATGA GCCTGATGAA 

651 GTTCGTCGTC GCATGGTTTT 

701 AACAAGCCGC GTTGAACGAA 

751 ACCAAAGGTC GTGTTTACGC 

801 CTCAACGGTT TCCGCCATGA 

851 TCAGCATTTT GGGGGCATTT 

901 TTCGGCGGCA CTTGCGGCGT 

951 GATTAAAACC GCCGACTATC 

1001 TGTTCGGCGC AATCGCCATT 

1051 GTCGGCGAAA TGCACACCGG 

1101 CATCCATCCC GGCTTCCTGC 

1151 TGGCGTTTGC CACAGGCACA 

1201 ATTGCCGCCG CCATGGCGGT 

1251 TATGTCCGCA GTAATGGCGG 

1301 TTTCCGACAC GACCATCCTG 



ACATTCATTT TTCTCGGTTG TGCCACCCTT 
TCATTACCCG CCGCGTACTG CTGTCTTTAG 
GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 
AGACATGGTC GTCGGCTTGG CTTGGTCAGA 
AACCAAAAAT CTTGGTTTTC CTGATACTTT 
CTGACCTACT CCGGCAGCAA TCAGGCGTTT 
CATTAAAAAC CGGCGCGGCG CG7W\ATGCT 
TAACCTTTAT CGACGACTAT TTCCACAGTC 
CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 
CCTCGACTCC ACTGCCGCTC CTATGTGCGT 
GGGGCGCGTC GATTATCGCC ACGCTTGCCG 
ATCACCGAAT ACACGCCGAT GGGGACGTTT 
CTATTACGCA CTGTTTGCCC TGATTATGGT 
CCTTCGACAT CGGCTCGATG GCACGTTTCG 
GCCCACGATG AAACTGCCGT TTCAGACGCT 
ACTGATTATT CCCGTTTTGG CCTTAATCGC 
TCTACACCGG CGCGCAGGCA AGCGAAACCT 
GAAAACACGG ACGTAAACAC TTCGCTGGTA 
CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 
CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 
TTAATCCTCG CTTGGCTCAT CAGTACGGTT 
CGATTACCTC TCCACACTGG TTGCGGGCAA 
CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 
AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 
CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 
GGGCGGTATG CGGCGACCAC TGCTCGCCCA 
TCGTCCACCG GCGCGCGCTG CAACCACATC 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 



1 MQLIDYSHSF FSWPPFLAL A LAVITEI RVL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSW6ASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKS MFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 lAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical transmembrane protein HI 1 586 of influemae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and C-terminus, respectively: 

Orf26 1 MQLIDYSHSFFSWPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

0rf26 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 



// 



0rf26 86 IFTSLLTYSGS— NTSLVFGGTCGVFAWLCTL—GTIKTADYPKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 

0rf26 142 XXXXXXX ST WGEMHTGDYLSTLVAGN I HPGFLPVILFLLASVMAFATGT SWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

Orf26 202 lAAAMAVKVE PAL 1 1 PCMSAVMAGAVCGDHCS PI SDTTILSSTGARCNHIDHVTSQXXXX 261 

lAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
HI1586 419 IAAAMA/««AAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

0rf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
HI1586 479 ATVATATSIGYIWGFTYSGLAGFAATAVSLIVIXFAVKKR 519 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of A^. 
meningitidis: 



10 20 30 40 50 60 

orf 26 . pep MQLIDYSHSFFSWPPFLALA LAVITR RVLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 
I I I I I t t tl I I I I I I I i I I I i I I I I I I I I t I I I M I I t I I I I i I I I I t I i I I I I I I I I 
orf 26a MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 

10 20 30 40 50 60 



70 80 90 99 

orf 26 . pep VGLAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

I I I I I I t I I I I I i I I lit I II M I I I II I II I I I 
orf 26a VGLAWSDGDWSLGKP KXLVFLILLGIFTSLLTY SGSNQAFADWAKRHIKN RRGAKMLTAC 

70 80 90 100 110 120 
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orf26.pep 

orf26a LVFVTFID DYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 
130 140 150 160 170 180 



TIAGLLVT YKITEYTPMGTFVAMSLMNYY ALFALIMVFVVAWFSFDI GSMARFEQAALNE 
190 200 210 220 230 240 

100 110 

TSLV 

I I I I 

AHDETAV5DGSWGRVY ALIIPVLALIASTVSAMI YTGAQASET FS I LGAFENTDVNTSLV 
250 260 270 280 290 300 

120 130 140 150 160 170 

FGGTCGVFAWLCTL GTIKTADYPKAVWQGAKS MFGAIAILILAWLISTW GEMHTGDYL 

1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 I n I I I I I I I I I I I I I I I I I I I I I t I I i I I I I I I I I I M I 

FGGTCGVLAWLCTL GT IKIADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYL 
310 320 330 340 350 360 

180 190 200 210 220 230 

STLVAGNIH PGFLPVILFLLASVMAFA T6TSW GTFGIMLPIAAAMAVKVE P ALIIPCMSA 
t I I i I I I I I M I I I I I M I I i I I I t I I I I I I I I I I I I I i i i I I I I I I I : I : I t I I I I I I 
STLVAGNIHP GFLXVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 
370 380 390 400 410 420 

240 250 260 270 280 290 

VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLP YALTVAAAAASGYLALGL TKSA 
I I I I I I M I I I I I I I I I I M I I I I I I I M I i i I I I I I I I I M I I I I I I i I I I I I I I I I I I 
VMAGAVCG DHC S P I S DTT ILS STGARCNHI DHVT SQLP YALTVAAAAASGYIALGL TKSA 
430 440 450 460 470 480 

300 310 
LLGFGTTGIVLAVLIFL LKDKK 
I I M i : t I i i I I I I I I I I I I I I 
LLGFGXTGI VLAVLI FL LKDKKRANAX 
490 500 

The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAANT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGNTTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

4 51 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCGC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGGC 

751 AGCTGGGGCA GGGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG TGCACAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGTGCATTT GAAAATACGG ACGTGAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGCTCGGCAC 

951 GATTAAAATC GCCGATTATC CC/W^GCCGT TTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTTG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACAGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGN CCGTCATCCT TTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT CATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAT CCCTCACTGA TTATCCCGTG 

1251 TATGTCCGCC GTGATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 



orf26 .pep 
orf26a 

orf 26.pep 
orf26a 

orf 2 6. pep 
orf26a 

orf 26. pep 
orf26a 

orf26.pep 
orf26a 

orf 26. pep 
orf26a 
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1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

5 This encodes a protein having amino acid sequence <SEQ ID 694>: 



1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KXLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

10 201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALIX PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT S WGTFGIMLP 

401 lAAAMAVKVD P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

15 451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 



20 



10 20 30 40 50 60 

orf 2 6a . pep MQLIDYSHSFFSWPPFLALAIAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
I I I I I I I I I t I t I I I I I M I M ) I I I I i I I I I I I I I I I t t I M I I I I I i I I I t I I I I I i i 
orf 26-1 MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 



25 



30 



35 



40 



45 



70 80 90 100 110 120 

orf 26a . pep VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I i li I I I I t I I I I I I I I 1 I I I I I t I M I I I i t I i I I I I I I I I I It i I I I I I I I I I I i I I 
orf 26-1 VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 26a. pep LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMPVSSWGASIIA 
IIIIIIMIIMIillM lllllllillll:llltllllltltll[llltllllllltl 
orf 26-1 LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 26a . pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
lllinilMMMMfltilllllllllllMIIIMMIIIIltllltlllllMMt 
orf 26-1 TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 26a . pep AHDETAVSDGSWGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I I I I I I I I I :: M I i i t I i I I I I I t I I I I I I I t M I I I I I I i I I I I t I I I I I I I I I I t t 
orf 26-1 AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 26a . pep FGGTCGVLAWLCTLGTIKIADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
50 I I M I I I I I I t [ I i I I I It I I I I I I I t I I I I I t I i M I I I I I t I I I 1 I I I I I I I I M I t 

orf 2 6-1 FGGTCGVLAVVLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGE^4HTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

55 orf 26a . pep STLVAGNIHPGFLXVILFLLASVMAFATGT SWGTFGIMLP lAAAMAVKVD PSLIIPCMSA 

MIIIMIIIII) llllllllMlllllllltMIIIIIMIIMIII:|:lllllM( 
orf 26-1 STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 



60 430 440 450 460 470 480 

orf 26a . pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
IIIIIIIIIIIIMIIIIIIIIIIiillllllillllltlllillllilltltlllllll 
orf 26-1 VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 



65 



490 500 
or f 2 6a . pep LLGFGXTGIVLAVLIFLLKDKKRANAX 
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orf26-l 



11111:11 Illtllilllllll 

LLGFGTTGIVLAVLI FLLKDKKRANAX 
490 500 



Homology with a predicted ORF from K^onorrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from K gonorrhoeae: 



orf2 6.pep 
orf 26ng 
or f 2 6, pep 
orf 26ng 

orf 2 6. pep 
orf 26ng 
orf 2 6. pep 
orf 2 6ng 
orf26.pep 
orf26ng 
orf 2 6. pep 



MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILXXVAFLVGGNPVDGLTHLKDMV 
I I I I ( I I 1 I i I I I I I I i I I M I I I I I I I I I I M I I I I t I I I I I i I I I I I I I I I I I I I I 
MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 
||||t:l IMMIIIIIII I I I I I I I I I I I I I t I i 

VGIAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 

// 

TSLVFGGTCGVFAWLCTLGTIKTADYPKA 
I I I I I I I I I I I : t [ I I I I : I I I I I 1 I I I t I 
ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAWLCTFGTIKTADYPKA 

VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 
I I i I I 1 I I I I I t I I I t I [ I I I I I I I I I i I I i I I I I I I I I M I I i I I I M I I I I I M I t I i 
VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 



60 



60 



97 



120 



326 



326 



386 



386 



446 



ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 
I I I t I I I I t I I I I I I I i I I I I I i I I I I I i I I I I I M I I I I I t i I I I I I I M I [ I I I M I I 
ATGTSWGTFGIMLPITWVAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 44 6 



CNHI DHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGI VLAVLI FLLKDKK 
I i t I I I I i I I I I t I I I I I I I I I I i I I I M I I I I I M I I I I I M I I I [ I I I t I I I I I 
CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 



orf26ng 

The complete length ORF26ng nucleotide sequence <SEQ ID 695> is 



502 



506 



1 ATGCAGCTGA TTGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TTTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGGCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGCATTTT CACTTCACTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGTGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGCC 

401 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCTCGC CCATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GATTGCTCGT TACCTACAAA ATTACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCG CTGTTTGCCC TGATTATGGT 

651 ATTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGAtg gCGCGTTTCG 

701 AACAGGCTGC GTTGAACGAA gcccaggacg aaaccgccgc tTCAGACgCT 

751 ACCAAAGGTC GTGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAATACCG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGTTCGGCAC 

951 GATTAAAACC GCCGATTATC CCAAAGCCGT GTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACGGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTAtcccGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGTTCGCCCA 

1301 TCTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTATGCC CTGACGGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC CGGTATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCGACGTTTG A 

This encodes a protein having amino acid sequence <SEQ ID 696>: 
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1 MQLIDYSHSF FSWPPFIAL A LAVITRR VL LSLGIGILVG VRFLV GGNPV 

51 DGLTHLKDMV VGLAWADGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLVT YK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 lAAAMAVKVE PALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 

ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 

10 20 30 40 50 60 

MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

IIMIIIillllllillllltllttlllllltlllllMltllltlltllllllllllll 
MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
10 20 30 40 50 60 



orf26-l.pep 
orf26ng 



70 80 90 100 110 120 

f 26-1 . pep VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
|||l|:lttllllilllllllllllllllllitMlllllllltllltlll llllllll 
f26ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 2 6-1 . pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 
I I I I I I I I I M I I I I I I I I I I I I I I I I I i t I : I I i I I I I I I I : t t I i I M I I t t I t It I I 
orf26ng LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 26-1 . pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGS^4ARFEQAALNE 
IMIIitltlllilllllllliililllllllliillllllltlltltllllllllllM 
orf 2 6ng TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 26-1 . pep AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
|:ill|:|l||lllllllllltlllltlllllllililtMIIII[IMIIIIIIItilt 
orf26ng AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 



310 320 330 340 350 360 

or f 2 6-1 . pep FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
I i I t I I t t I I I t I I: I I t i i I I I I I I I 1 1 I I I I I I i I t I I I I I I I I I I I I I I I I I I I I I I 
or f 2 6ng FGGTCGVLAWLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 26-1 . pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
ttlilllilllllllltlllltlllllllMllillllllMllltlliilllttllltl 
orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAT^VKVEPALIIPCMSA 

370 380 390 400 410 420 



430 440 450 460 470 480 

orf 2 6-1. pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I M I [ I I i I I I I 1 I I 

orf26ng VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 



490 500 
orf 26-1 . pep LLGFGTTGIVLAVLIFLLKDKKRANAX 

It I I I 1 t I t I I I I I t I I t I I 11 I t : : 
or f 2 6ng LLGFGTTGIVLAVLI FLLKDKKRADVX 

490 500 



In addition, ORF26 ng shows significant homology to a hypothetical HAnfluenzae protein: 
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sp|P44263|YF86_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi 1 1074850 |pir I I C64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 11574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
5 Score = 538 bits (1370), Expect - e-152 

Identities = 280/507 (55%), Positives = 346/507 (68%), Gaps = 7/507 (1%) 

Query: 1 MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAEXVGGNPVDGLTHLKDMV 60 
M+LID+S S +S+VP LA+ LA+ TRR L +L V 

10 Sbjct: 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Query: 61 VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 120 

V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 

Sbjct: 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 132 

15 

Query: 121 LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 180 

LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA II 
Sbjct: 133 LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 192 

20 Query: 181 TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 240 

+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 
Sbjct: 193 LIGGLLATYSITEYTPIGAFVAMSSMNFYAIFSIIMVFFVAYFSFDIASMVRHEKLALKN 252 

Query: 241 AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQA SETFSILGAFENTDVN 296 

25 +D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 

Sbjct: 253 TEDQLEEETGTKGQVRNLILPILVLIIATVSMMIYTGAEALAADGKVFSVLGTFENTWG 312 

Query: 297 TSLVFGGTCGVL — AWLCTPGTIKTADYPKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 354 
TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 

30 Sbjct: 313 TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 372 

Query: 355 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 414 

TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 



35 



Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 474 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 4 92 



40 Query: 475 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 493 GFTYSGLAGFAATAVSLIVIIFAVKKR 519 



Based on this analysis, it is predicted that these proteins from N.meningitidis and Kgonorrhoeae, 
45 and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 83 

The following partial DNA sequence was identified in N.meningitidis <SEQ DO 697>: 

1 ..AAGCAATGGT ATGCCGACGN ,AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

50 101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 

1 ..KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
55 51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

60 151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 
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201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCATVATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 



1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmenimitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 



10 20 30 

or f 27 . pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 

1 i I I i I : t t I i I I I I I I I t i I I I I I I I I I 
orf27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 
140 150 160 170 180 190 



40 50 60 70 80 

or f 27 . pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 
M I I I [ I I : I I I M I I I I ) I I I 1 I I I I I I I I I I I It I 1 I I I t I I I I I I I 
orf27a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701> is: 



1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 



1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 



10 20 30 40 50 60 

or f 27a . pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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I I I I { t I i I I t I I I M I I I I I I t : I I I I I I I I t I M 1 I I i I I t I I I 1 .1 i : I i It I I t 
orf27-l MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 27a . pep XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 
i I I I I I I I I t I I I I I t I I I I i I I t I i I I I I I I I I I I I I I M I I I i t I I I I I I i I It I I 
orf27-l YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 27a . pep NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 
I t I t I t II t I t t t II I t t t lllltttlltltliMllltllttltttttHlltltllt 
orf 27-1 NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVK 

130 140 150 160 170 180 



190 200 210 220 230 240 

or f 27a . pep DEPAKILTWDESGRLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
II 1 t I t 1 t It 1 t t I I 1 1 I 1 1 I : 1 1 t M 1 t t i I I It I I I I 1 II 1 I I I I 1 1 1 I 1 I I I 11 
orf 27-1 DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf 27a. pep YLIEPX 
i I t II I 

orf27-l YLIEPX 

Homology with a predicted ORF from fi, gonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N, gonorrhoeae', 

or f 27 . pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

1 1 1 1 1 I 1 I I t I I I I I 1 I 1 1 I I 1 1 I 1 I 1 I I 
orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVNDEPAKILTWDESG 193 

orf 27 . pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

I I I t 1 I I 1 M I : 1 II 1 t II H 1 11 t I 1 I [ I 1 I 1 I I 1 I I I t I I I t t I I I I 1 I 
or f 2 7 ng RLLSELS IRHHPCRNG WLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIE P 245 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 

1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY BCNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1. pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 
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10 



[ I I I I I I I I t I [ I I I I t M I I I I i I I I I I I I I ! I I t I I I I I I I I i I I I I : I I i I I M I I 
orf27ng MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 27-1 . pep YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
I I M I I I I I I I I I I I M I I I I I I I t I I I I n I I M I I i I I I I I I I I I i M i I I t t I I t I I 
orf27ng YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKBCMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 27-1 . pep NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

1 1 1 1 1 1 1 ] I i i 1 1 1 1 M 1 1 1 1 1 n I I I I I I i M i 1 I I I I I I I I I i I I I I I I I I I I I 1 I I t 

orf27ng NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 
15 130 140 150 160 170 180 

190 200 210 220 230 240 

or f 27-1 . pep DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
[ I I I I I I I I I I I I M f I I I I I M I : I I I I M I I t I I I I I I I I I I I I t I I i I i I [ t I I I j I 
20 orf27ng DEPAKILTWDESGRLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 

orf 27-1, pep YLIEPX 
25 I I M I I 

orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N.meningitidis and KgonorrhoeaCy and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in Exoli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of afiinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in E.colu Purified GST-fixsion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

451 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 



1 

51 
101 



MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
MIWGYAGLW lAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ED 707>: 



1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

451 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW lAFLLTAVA T WTGQPPTRGG VLVGLTIFWL jUVRIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted ORF fix>m Kmeninsitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of K 



meningitidis: 

10 20 30 40 50 60 

orf 47 . pep MKETKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 
I i t I M I I I I I I I I I i I t I 1 t I t [ i I I I t I I I I I i 11 i I I I I I I I I I I M I I I i t t I f i 
orf 47a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHE MIWGYAGLW 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 4 7 . pep lAFLLTAVA TWTGQPPTRGG VLVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
I I I i i I I I I I I I t I I I I t I M I t I t I t I I I t I [ I I I I I I I I I I I i I 11 I M i I i It I I t I 
or f 4 7 a lAFLLTAVA TWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
70 80 90 100 110 120 



130 140 150 160 170 

orf 4 7 . pep MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 
1 I II I 1 1 I 1 1 1 1 1 I II I ) I 1 1 I I 1 I I 1 I I t I II 1 1 1 1 1 1 I 11 1 I I I II I f 1 I 
o r f 4 7 a MAL PVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 

130 140 150 160 170 180 



orf47a 



GTRII SFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
190 200 210 220 230 240 
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The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 

1 ATG/WKTTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC. ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ED 71 0>: 

1 MKFTKHPVWA MAFRPFYSLA TILYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW lAFLLTAVA T WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
I I I I I I I I I i I I I I I I I M I t I I I M I I I t I I t I It t I I I I I I I I I I I I I I I I I t i t I I t 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
10 20 30 40 50 60 

70 80 90 100 110 120 

lAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
MllltllilllllllllllllllllMlinillltllitMlllllitlltlltlin 
lAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
I I t I M I I I I I t I I I I t I I I I I M i t I I I I I I I t M I t I t t I I I I I I I I M I M I I I I I i 
MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

190 200 210 220 230 240 

GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMU4AHGVMPWLSAAFAFAAGVIFT 
iillillltllllliltllllltMlillllllllllllllll: llll:IIIIIIMII 
GTRI I S FFTSKRLNVPQI PSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 
190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
1 I t 1 M I I I I I I M I I I t I t I I I I I I I I i I I I I I I I I I I t I M I I I M I I I t I 1 I I M t I 
VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
250 260 270 280 290 300 



orf 47a .pep 
orf47-l 

orf 47a -pep 
orf47-l 

orf 47a. pep 
orf47-l 

orf 47a. pep 
orf47-l 

orf 47a. pep 
orf47-l 



310 320 330 340 350 360 
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orf 4 7a . pep . LGMMARTALGHTGNPI YPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
I t I I I I I I I M I I I I t I I M I I I I i I I I I I I I I I I I M I t t t I I I I I I I I I I t 1) I I I I I 
orf 47-1 LGMMARTALGHTGNPIYPPPKAVPVAFWL14MAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf 47a .pep LALLVYAWKYIPWLIRPRSDGRPGX 
I I I I I I I I I I I I I I I i I I I I i I I I I 
orf 4 7-1 LALLVYAWKYIPWLIRPRSDGRPGX 

370 380 

Homology with a predicted ORF from Ksonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N.gonorrhoeae: 

0RF47 mkftkhpvwamafrpfyslaalyg;u,svllwgfgytgthelsgfywhahemiwgyaglw 60 

lllilllllllMilllliltlllllillllllttlMIIMMIItlllllllllllil 
ORF4 7ng MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

ORF47 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 120 

I I I M I 1 I I i I I i I I I I t It i I I I I I I i I I M t i f I I I I M I : I I I I I t I I I I I I I I I I 
ORF47ng IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 120 

0RF4 7 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 

lllllllill:|lllllll:lllllllllltlllilllllllllll[lltll 
ORF4 7ng MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 180 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVA T WTGOPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RR NYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMP WLSAA FPFAAGVIFT VQVY AGGXTP 

251 IEETSC6SVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an Ile/Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orG96, accession number e246540): 

TM segments in ORF47ng 



INTEGRAL 


Likelihood 




-5. 


63 


Transmembrane 


52 


- 68 


INTEGRAL 


Likelihood 




-3. 


,88 


Transmembrane 


169 


- 185 


INTEGRAL 


Likelihood 




-3. 


.08 


Transmembrane 


82 


- 98 


INTEGRAL 


Likelihood 




-1. 


,91 


Transmembrane 


134 


- 150 


INTEGRAL 


Likelihood 




-1. 


,44 


Transmembrane 


107 


- 123 


INTEGRAL 


Likelihood 




-1. 


,38 


Transmembrane 


227 


- 243 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 

1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 

251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGG CGGCAAGCGG CATACTCGGT. ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

401 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 

451 CACGtccAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 
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651 ACTGATGGCG CACGGCGTGA 

701 CGGCGGGCGT GATTTTTACC 

751 GTATTGAAAG AACCGATGCT 

801 CGGATTGGGG CTGATTGCGG 

851 TCAATCTGGG CGTACATCTG 

901 TTGGGCATGA TGGCGCGTAC 

951 TCCGCCGCCC AAAGCCGTTC 

1001 CCGCCGTCCG TATGGTTGCC 

1051 AGCATCCGCA CGTCTTCGGT 

1101 GTGGAAATAC ATTCCGTGGC 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW lAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAA5 GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAV PVAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 4 7-1 . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
I I I I M I t I t I I I I t t I It I I t i t I I I I I I i I 1 I I I 1 I I I I I 1 I M I I t I i I I t I I I I I t 
orf47ng-l MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 47-1 . pep lAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
I I I I I I I t t I I I t I I I I t I I I it I I I ttttttltttllMt|:ltttttllittitltt 
orf 4 7ng- 1 lAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFI PGWG7UUVSGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 47-1 . pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
t I I I I II t t I : I t t t t t II : t I t I I t t I I I i I t I I t I t t t I I t I I t I t It I t ) I 1 t I t II 
orf47ng-l MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 47-1 . pep GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 

I I I I t I I I I i I I I I I I 1 I I I I I II t I I It i I I 1 I t : I II I II : 1111:1111111111 
orf47ng-l GMRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAILMAHGVMPWLSAAFAFAAGVIFT 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 47-1 . pep VQVYRWW YK P VLKEPMLWIL FAG YLFTGLGL I AVGASYFKPAFLNLGVHL IGVGGIGVLT 

II t I I I I I II I I II I I I I II I II I t I I I II I I I I M I I I I I I I I t I I I I I M II I I I I I I 
orf47ng-l VQV YRWW YK PVLBCE PMLW I L FAG YLFTGLGL I AVGASYFKPAFLNLGVHL IGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 47-1 . pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
1 1 1 1 1 II n t I I I I I i I I I II I I I I I II I I I I I I II I I I I I M I i I I I t I I I I I I I II t 
orf47ng-l LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf 47-1 . pep LALLVYAWKY I PWLIRPRSDGRPGX 
II I I I I I I I I I I I i II 1 I I I I I I I I 
orf47ng-l LALLVYAWKY I PWLIRPRSDGRPGX 

370 380 

Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 

gnll PID|e24654 0 (Z73914) ORF396 protein (Pseudomonas stutzeri) Length == 396 
Score = 155 bits (389), Expect = 5e-37 



TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 
GTACAGGTGT ACCGCTGGTG GTATAAACCC 
GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 
TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 
ATCGGGGTCG GCGGTATCGG CGTGCTGACT 
CGCGCTCGGT CATACGGGCA ATTCGATTTA 
CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 
GTATTTTCTT CCGGCACTGC CTACACGCAC 
TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 
TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 
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Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 

Query: 7 PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

Sbjct: 14 PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP— GFQPTGGWLAWHRHEMLFGFAMAI 71 

Query: 60 VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 

V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
Sbjct: 72 VAGFLLTAVQTWTGQTAPSGNRLVGLAAVWLAARL-GWLFGLPAAWLAPLDLLFLVALVW 130 

Query: 120 CMALPVIRSQNRRNYVAVFAIFVLGGTHAAFXXXXXXXXXXXXXXXXXXXXXMVSGFIGL 179 

MA + + +RNY V + G +V+ + L 

Sbjct: 131 MMAC^LWAVRQKRNYPIVVVLSLMLGADVLILTGLLQGNDALQRQGVIAGLWLVAALMAL 190 

Query: 180 IGMRIISFFTSKRLNVPQIPSP-KWVAQASLWLPMLTAILMAHGV MPWLS7\AFAFA 234 

IG R+X FFT + L P W+ A L + A+L A GV PL FA 

Sbjct: 191 IGGRVIPFFTQRGliGKVDAVKPWVWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 

GV +++ RW+ K + K +LW L L+ + + +F A 
Sbjct: 250 IGVGHLLRI^WYDKGIWKVGLLWSLHVAMLWLVVAAFGLALWHFGLLAQSSPSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 

M+AR LGHTG + P+AFL FS + 

Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FNLGTTU^VFLSVAWPVGGLW 365 

Query: 354 TSSVLFALALLVYAWKYIPWLIRPRSDGRPG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 



Based on this analysis, it is predicted that the proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useftil antigens for vaccines or diagnostics, or for raising antibodies. 



Example 85 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 715>: 

1 . .ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGAT6ACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

.351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

401 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 ..MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFWWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.sonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
N. gonorrhoeae: 
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orf 67 .pep 
orf 67ng 



orf 67 .pep 
orf 67ng 
orf 67 .pep 
orf 67ng 
orf 67 .pep 
orf 67ng 



MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 30 
I i I t ! I I I I II I Mill I I I 1 i I I 
TNFEIAVLSGMTVRVFYCARPAPVNGGRLKMPSEGSDGIGIGESEAVAHAQRGFVGFEAG 14 6 
90 100 110 120 130 140 

VFQASP\AA/TVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLyXXAXXXXAX 90 

tlllllll|:t:l) MM:: ::: ]| llt:il I : : 
VFQASP\AA^AVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRHCCVSI 206 

XWXXXXSRGFXXHRMNLMFNVSVGDARADIGFEFIVEFEIVNGGQAERRNGVEAAVSLMF 150 
: i : I:: : :|||||||:||llll:tlltlll[lillllllll II lit 

TRVGGKSTCYFFSRIDAVSDVSVGDARTDIGFEFWEFEIVNGGQAERRNGVECAVFLMF 266 

CLGFFW WYLFSNFFSRRITFF-PFSVTGI ICRYSPAAEI 1 90 

III :: I: I: : I : II Mill :|lll: 

RLLVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGIIRGDAPAAEWADRHPGVDGM 326 



The ORF67ng nucleotide sequence <SEQ ID 71 7> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 718>: 



1 

51 
101 
151 
201 
251 
301 
351 



MPSETVGSIV 
NRHSHGSGNL 
VFYCARPAPV 
SPVWAVAGV 
NCCVSITRVG 
QAERRNGVEC 
PVTGIIRGDA 
IVGNAFGGVG 



NVGVDESVGF 
GRGVWATVLS 
NGGRLKMPSE 
QGQAGRDVYA 
GKSTCYFFSR 
AVFLMFRLLV 



SPPFPSIQHF 

DKFPCGQVRI 

GSDGIGIGES 

HARHRAEAQA_ 

IDAVSDVSVG 

FYVKLVAAKS 



PAAEWADRH PGVDGMRTDV 



YRFHRIHRIR LFRPPGPMQL 
PACAGMTNFE lAVLSGMTVR 
EAVAHAQRGF VGFEAGVFQA 
AAAVAFLIGV FLRMSVR INR 
DARTDIGFEF WEFEIVNGG 
FIILSFQLFY VHGIFIW PF 
SEIIAYRAYF VFAWSGWFRI 



Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from Kmeningitidis and N.gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ED 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCA. . , 

This corresponds to the amino acid sequence <SEQ ID 720; ORF78>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 

101 F DKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 721>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 



1 MFAFLEAFEV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYL RFIIM DGLAALISVP 

151 IWI YLGEYGA HNIDWLMAKM HSL QSGIFVI LGIGATWAW I WWKKRQRIQ 

201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homolopue of H.iniluenzae (accession number P45280) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

0rf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ICGFGVPXPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 79 



Orf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 



Orf78: 122 VFVTAGISRKVSYLRFIIMDGLAA 145 

+++ +GI+R+VSY+RF+++D AA 
DedA: 140 lYMVSGITRRVSYVRFVLIDFCAA 163 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of N, 
meningitidis: 

10 20 30 40 50 60 

orf 7 8 . pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
lll:inilllllllll[||IIIIIIilllllllilllllMlillltlllMIIIIIII 
orf7Ba MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 7 8 . pep VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGN WVLFVARFLPGLRT 
t I I I I I t t 1 t I I I I I I i ) I III I I I I II I I I I I i M I I I I I I I t I I It I I I I I 
orf 78a VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGN WVLFVARFLPGLRT 

70 80 90 100 110 120 



130 140 
orf 78 .pep AVFV TAGISRKVSYLR FIIMDGLAA 

t II I II II I I I I I I I I I : I II ) I II 
orf 78a AVFV TAGISRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSL QSGIFIA 
130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 



1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence <SEQ ID 724>: 



1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGN WVLF VARFLPGLRT AVFVT AGISR KVSYL RFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW FW WRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overly: 



10 20 30 40 50 60 

orf 78a . pep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
)||:|||||MllltlltlllllltlllllllllllillllMII)illIllilllllll 
or f 7 8 - 1 MFAFLEAFFVEYGYAAVFFVLVICGFGVPI PEDLTLVTGGVI SGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 78a . pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
I t I I t n I 1 t M I I M i i I t : t t t 1 I I I I I I I I I I i I t I I M I M I M I t I I I I I I I I I 
orf 78-1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 78a. pep AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

[illMlllltillllt:[lllllltllll:llllltjlllilli)IM Mil: 

orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

130 140 150 160 170 180 

190 200 210 220 

or f 7 8a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
II: |:::tl:||:||:: I : I I : : I : I I t i : I i I llttllM::lt 
or f 7 8- 1 LG IGATWAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 

190 200 210 220 



Homology with a predicted ORF from N.2onorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 



orf 78 . pep XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

I I I I I i I t I I I I I I I I I I I I I I t I I I I I I i 
orf78ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf78.pep IIMDGLAA 145 
: I I I I I I t 

orf78ng LIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVIJ^IAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 



1 . . YP VLFVARFL PGLRTAVFV T AGISRKVSYL RFLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW LMAKMHSL QS GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 



1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 
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351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAT^CGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF V7VRFLPGLRT AVFV TAGISR KV5YL RFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAT^LAW FW WRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf 78-1 . pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
|||:l||i]|l|llllllittlltllllilMllltilllllll)IIMil)lillltll 
orf78ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78-1 . pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
I I : I I I : t I t M t I I [ I I I I : I I I I I I I I 1 I I I I I I 1 I t I M I I I I I I i I I i 1 I I I I I I 
orf78ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78-1. pep AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

I I I I I I I I I I I f I I I I I : I I I t M M I I I I : I I I I t I I I I [ I I I I I I I I t M I I I I I I : 
orf78ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 

190 200 210 220 

orf 78-1 . pep LGIGATWAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKT^QSKQX 
II: t : : : II : [ I : It : : I : I I : : I : I II I : II 1 I I M I I I I : : II 
or f 7 8ng- 1 LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from Kinfluenzae: 

sp|P45280|YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi I 1073983 I pir | | D64133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 1 157447 6 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect ^ 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 



Query: 


5 


LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGVL 


62 






L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 




Sbjct: 


21 


LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 


80 


Query: 


63 


AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 


122 






AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 




Sbjct: 


81 


AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 


140 


Query: 


123 


FVTAGI SRKVS YLRFL IMDGLAALI SVPVWI YLGE YGAHN I DWLMAKMHS LQSG I FI ALG 


182 






++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 




Sbjct: 


141 


YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 


200 


Query: 


183 


VL 184 
L 




Sbjct: 


201 


YL 202 
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Based on this analysis, including the presence of putative tratismembrane domains, it is predicted 
that these proteins from N, meningitidis and N.gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 87 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGT^GTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 731>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGTUVGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of N, 
meningitidis: 

10 20 30 40 50 60 

orf 7 9 . pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
II lltlltlllMII1lltt:llltlttMIIIIII:ttlltll)illlltllllllll 
orf 7 9a MKXLLAAVMMAGLAGA VSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 79 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
IIIIIMIIMMIMMIIlllllitMlllllllllllllilttll llltl lllll 
orf 79a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
70 80 90 100 110 120 
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130 140 
or f 7 9 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 
IIIIIIIIIMIIIIIIi III 11:1 
5 o r f 7 9 a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 150 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

10 101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

15 351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIflVEDGWAR TTVEGMKMGG AFMKIHNDEA 
20 51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 

101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

25 or f 7 9a . pep MKXLLAAVMMAGIAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

11 M I I i I I I I I I I I II I It: I i I I 1 II M II I II I : 1 I I t II I II til t i I I II t I I I 
orf 7 9-1 MKKLLAAVMMAG1AGAVSAAGVHVE[X3WARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
10 20 30 40 50 60 

30 70 80 90 100 110 120 

or f 7 9a . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS YHVMFMGXKKQLKXGDKI P 
I I II I I t It I I I I I M II I I M I I I I I I I I I I I I I i I I 1 M I I I I II t I I I I I I I I 11 
orf 7 9-1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLECEGDKIP 
70 80 90 100 110 120 

35 

130 140 150 

orf 79a. pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
I t I I t I I I I I I I t I I I I I III II: I I I t I I I I I I I t 
or f 7 9-1 VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
40 130 140 150 

Homology with a predicted ORF from N.^onorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlq) with a predicted ORF (ORF79ng) from 
gonorrhoeae: 

45 orf 79 .pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

II i I I I I I I I I i : I I I I I I I II I II M I I I 
orf79ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

or f 7 9 . pep YHVMFMGLKKQLKEGDKI PVTLKFKNAKAQTVQLEVKIAPMPAMNH 147 
50 I I t t I I I t I I I t I I t t II I I I I I I I I I I I I I I I I t I I I I I I I I I 

or f 7 9ng YHVMFMGLKKQLKEGDKI PVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 735> was predicted to encode a protein comprising 
amino acid sequence <SEQ ED 736>: 

1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
55 51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 737>: 
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1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

5 201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

10 451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 

15 151 HGEMQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 



10 20 30 40 50 60 

orf 7 9-1 . pep MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
i I I I t I I I I I I I I I I I I I I I I I I I I I I I i I I t I I t I I : It t I I I i I I I I I 111:1111 
20 or f 7 9ng- 1 MKKLIAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 



25 



70 80 90 100 110 120 

orf 79-1 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

ItMllllllltllliinillhllllllllltitlllllltlllMIIIIIIItllll 

orf79ng-l PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
70 80 90 100 110 120 



130 140 150 

30 orf 7 9- 1 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

IIMIMItlltiltlll Ml I I I I I I I II t I I II I 
orf 7 9ng-l VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 
130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein firom Aquifex aeolicus: 

35 gi 1 2983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 

Score =63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 24 VEDGWMTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 
40 V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
45 Sbjct: 87 ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 



Based on this analysis, it is predicted that the proteins from Kmeningitidis and gonorrhoeae^ and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed m E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 8 A shows 
the results of affinity purification of the His-fiision protein. Purified His-fiision protein was used 
to immunise mice, whose sera were used for ELIS A (positive result) and FACS analysis (Figure 
1 8B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a usefiil 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
739>: 



1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTMCGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This conresponds to the amino acid sequence <SEQ ID 740; ORF98>: 



1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAEVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 741>: 



1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 



1 MTEXAAEGGK AAICALKKY LI TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORE fi"om N, meningitidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORE (ORE98a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

orf 98 . pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
II I t I I I I I I I I M II I I I I I I I I I I I I I I I I t I It II I I I I I t I I I I 1 I I I II I I M 
orf 98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 



wo 99/24578 



-417- 



PCT/IB98/01665 



70 80 90 100 110 120 

orf 98 . pep GFNIPGLGVIVAIAVLEVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 
illlllMltilMIIIIIIIIIIIIIIIIIIIIIMllllllllllllMtllM :| 
orf 98a GFNIPGLGVIVAIAVLEVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

or f 98 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 
I I I i I I I I I I I I I I I I I I I M I I M i I I I I i I I I I I I I I I I I M i M I I I 1 I I I I t I I 
orf 98a SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98 . pep IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQX 
I M I I I I I I I i M I I I I t i i I I I I I I I I I I I I I M I I I I I I I I i I I I n I t I 
orf 98a IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTATT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CTTGTTGGGG 

301 CGGATTCCGG TTGTGAAGTC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 NTCGTTGCTG TCCGACAGCA GCCGTTCGTT TAAAACACCA GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIP GLGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFECTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 

10 20 30 40 50 60 

or f 98a . pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
Ml I I I I I I n t I I I I 1 I I I I I I I I I I I I I i I i I I t I i t I M I I i I I I I I I I I I I I I M 
orf 98-1 MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a . pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 
I I I I I I I I M I I I I I [ I I 1 I I I I I M t I I I I I I I I I I I I i I I I M I i I I t I I I t I i III 
orf 98-1 GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

or f 98a . pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
[ II II I I II I I I I I I I I II I t I I t I t II I I I I II M I I) I I I I I I I I i I II I I I t I I I I 
orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98a . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
1 I II II I I I I I I I I II I I I I I I I II I I I I I M I I t I t I II II t I I I I II M I II 
orf 98-1 IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 
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Homology with a predicted ORF from N, gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 



10 20 30 40 50 60 

MTVTAAEGGKAAKALKKYLITG ILVWLP I AVT VWWS Y I VSAS DQLVNLLPKQWRPQYVL 

II iiiniiiiiiiiiitiiiniiiiiiiiiiiiiiiiiii t 

MTEP;\AEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 



60 



60 



120 



N, gonorrhoeae: 

orf 98 .pep 
orf 98ng 
orf 98. pep 
orf98ng 
orf 98. pep 
orf 98ng 
orf 98. pep 
orf 98ng 

The complete length ORF98ng nucleotide sequence <SEQ YD 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>; 

1 MTEPAAEGGK AAKALKKY LI TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ftNVLGRQ ILAAWDSLLX 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAEVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVBCTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 



GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 

IIIIIMIIIllltllllMMIIIIIIIIIIIIIIIII llltlllMMIIIIII :t 
GFNIPGLGVIVAIAVLFVTGLFAANVLGRQIIAAWDSLLXRIPWKSIYSSVKKVSESLL 120 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

IMIilllllMIIIII IIIIIIIIIMIIIIIIIIII I II t II I I I II II II II I I I 

S DSSRS FKT PVLV P FPQSGI WTIAFVSGQVSNAVKAALPQDGDYLS VYVPTT PNPTGGYY 180 

IMVKKS DVRELDMS VDEXLK YVI S LGMVI PDDLPVKTLAXPMPSEKADLPEQQ 233 
Itlllllllllllllll I I I II I I I I t II II I It I I II III 111:11111 
IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQ 233 



1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 



1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 

10 20 30 40 50 60 

orf 98-1 . pep MTEXAAEGGKAAKALKKYL I TG I LVWLPIAVTVWWSY I VSAS DQLVNLLPKQWRPQYVL 
III II I I I I II t I I I I II i I I I II I I II I I II M I I I I II I I I I t t I I I I II I I I II M 
orf98ng-l MTEPAAEGGKAAKALKKYLITG I LVWLPIAVTVWWSY IVSAS DQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 98-1 . pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
lllllllllllllllllllllllMlltMIMIIIIIIIIIMIIIIIIIItlllllM 
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orf98ng-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf98-l pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
IIMMIIMIilllll I I I I I I It t I I I I I I I I I I I I: I I I I I I I I i I t I t I I I I I I I 
orf98ng-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98-1 . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I I I I I I I I I I I i I i I I I I I I i t I I I I i I I I t I I i I 1 t I 1 I i I I 111:111111 
orf98ng-l IMVKKSDVRELDMSVDEALKYVISLOTVIPDDLPVKTLAGPMPPEKAELPEQQX 

190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from Kmeningitidis and N.gonorrhoeaey and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 89 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 749>; 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GsGgTACTCA 

201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 G^gAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

401 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

451 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA .ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

701 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC... 

This corresponds to the amino acid sequence <SEQ ID 750; ORF100>: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AWVWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYIAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLBCEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Fxuther work revealed the complete nucleotide sequence <SEQ ID 751>: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 
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201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAACAT 
CCGGAAAAAC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTTCTA 
AGGCGCAGCG 
GCAGCGTTAG 



AAGATGCAGC 
CTTGAACAAG 
AACTAGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTTGGGCA 
CCAGCTGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
AGCAGCATAG 



GTTTCGGTTC 
GCGGGTTTGG 
CTCACGCGTG 
TGATGCTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TCGACGAAAT 
TTGGAAGCCG 
CTGA 



GGCGCGTAAA 
CGTATTTTGA 
TTGGTCAACA 
CGCGCACGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TCTCCGATGA 



GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCCGGACAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
CGAACGTCAC 



This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AG(^NIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 

401 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N,menin2itidis (strain A) 

ORFIOO shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A oiK 
meningitidis: 



10 20 30 40 50 60 

orf 100 . pep MKTVVWIVVLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

lllilltlllllli I) lilllllllllllllllilllllllllMIIIIIII 

orf 100a MKTVVWIVVLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
lllitil t I I I i I I I I I I I I 1 ItiltltllilMitMIIMIilll II : III 
orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100 . pep TIAI^LXAHAAGQMENIXXRDRYIJ^IAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
I I I I I I I t I I II I II ! I I I I I II II II II I I II I I I II I I I I t I I I I t i II I II I II 
orf 100a TIJUiMIXiAHAAGQMENIEUlDRYU^IAKLPEKQQLSRYLLIAESALNRRDYEAAEANLH 
130 140 150 160 170 180 



190 200 210 220 230 240 

or f 100 . pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

II I I I I I I I I II t II : I I I I I I I I I I I I M I II i I M II I I I I I It I I I I I I I I I I 
orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 

IIIIIMtlMlltMlllllllllllltllllillMIIIII I IIIIIIMI 

orflOOa DAADAAALKTCLKRIPDSLKNGELSVSV7\EKYERLGLYADAVKWVKQHYPHNRRPELLEA 
250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 100 . pep FVESVRFLGEElEQQKAIDFADAWLKEQPDN/VLLLMYLGRLAFGRKLWGKAKGyLEASIAL 
I 1 I t I I I t I I I : I I I I t I I I I I I I t I M I I I i I t i I I i I t : I I I i I I I I t I I I I I M M 
or f 1 0 Oa FVE S VRFLGERDQQKAI DFADAWLKEQPDNALLLX YLGRLAYGRKLWGKAKG YLEAS I AL 

310 320 330 340 350 360 



370 380 
KPSISARLVLTKVFDEIGEPQKAEAH 
I 1 I 1 I t t I I t: I I t I I I I I t t I i I : 

KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 
370 380 390 400 

The complete length ORF 100a nucleotide sequence <SEQ ID 753> is: 



orf 100. pep 
orflOOa 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAAAACGG 
GGCATTGGCG 
AGACCATGCT 
GCCGTCGTGG 
TANCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
GGATAACCGG 
TGGAAAACAT 
CCGGAAAAGC 
GMCCGGCGC 
AGATG7UVTGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGANGTATCT 
AAAGGCTACC 
TTTGGTTCTG 
AGGCGCAGCG 
TCCGCCGAAA 



TAGTCTGGAT 
TCGGGCATTN 
CAGAATCAAC 
TGTGGTATTT 
AAGATGCAGC 
TTTGAACAAG 
AACTTGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTNGGGCA 
CCAGCTGNCG 
G6ATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCATTGA 



TGTCGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTCAAA 
GTTTCGGTTC 
GCGGGTTTGG 
CTCGCGCGTA 
TGATGTTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAA 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TTGACGAAAC 
TTGGCAAGCG 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGTAAA 
CGTATTTTGA 
TTGGGAAACA 
CGCACATGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGG/VAT 
GTATGCCGAT 
GACCCGAACT 
CGCGATCAGC 
GCCCGATAAT 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TTGCCGAGGA 



CNNTCGGGCT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCGGG 
GCCGGGCAGA 
CGCCT^CTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAANTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCN 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
AAACCGNCCT 



This encodes a protein having amino acid sequence <SEQ ID 754>: 



1 MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AWVWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EECAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 



ORFlOOa and ORFlOO-1 show 95.1% identity in 406 aa overlap: 

10 20 30 40 50 60 

or f 100a . pep MKTVVWIVVLFAAAXGLAIASGIXTGDVYIVLGQTMLRINIJIAFVLGSLIAVVVW^ 

t I t I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I ) I I i I I I I I I I I t I I I I I I I I i I 
orf 100-1 MKTVVWXVVLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100a . pep FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
I I i I 1 I I M t I I t I I i M I I I I I I I I I t I t I It t I I I I I I I I 1 I I i I I I I I I M I I I I 
or f 1 00 - 1 FI IGVLNI PEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100a . pep TIAI^LGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

n 1 1 M n 1 1 1 1 1 1 1 1 1 1 M 1 1 1 i 1 1 1 i 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 i M 1 1 1 1 1 ) 1 1 1 

orf 100-1 TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 



wo 99/24578 



-422- 



PCT/IB98/01665 



130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



190 200 210 220 230 240 

or f 1 00a . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill I II I I I II II I t I I II I 
orflOO-1 AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100a . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I t I t I M II II 1 I II i M M I I II I II II I I I I I I I I I I I I I I t II I t II I t I I I I I II I 
orf 100-1 DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100a . pep FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 
IIIIMIIIIhllllllllllllllllllllll I I I 1 1 I I I II I I I II I i I M I I I II 

orf 100-1 FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 390 400 

orf 100a . pep KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 

I I I I I t I I I I I I I I I i i I I I I I I I I I I I I t :|::::t :| I I 
orf 100-1 KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

370 380 390 400 



Homology with a predicted ORF from N.mnorrhoeae 

ORFIOO shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 



30 



35 



40 



45 



50 



55 



Kgonorrhoeae: 

orf 100. pep 



orf lOOng 



orf 100. 



pep 



orflOOng 
orf 100. pep 
orflOOng 
orf 100 .pep 
orflOOng 
orf 100 .pep 
orflOOng 
orf 100 .pep 
orflOOng 
orf 100. pep 



MKTVVWIVVLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 
I I I I I I I I I i I I I I I I I I I t I I I I I I I I I I I I I i I I I I I I t I I II I I I t I I I I I I I I I I I 
MKTVVWIVVLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 



60 



60 



120 



FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
lllilMIII:l:[ llllll I I I t I I I ! I I I I I I I I 11 I I i I I t I I I II : III 
FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 120 



TUU^MLXAHAAGQMENIXXRDRYIJ^IAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

unit 1 1 1 1 1 1 1 1 i I 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 i 1 1 1 II 1 1 1 1 i 1 1 1 1 1 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 



180 



180 



240 



AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
II I I M I I t I I I I I I :tiililllllllllllllllllllll|[ltlllllllill|:l 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 240 



DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
I I I I 1 t I t I i I I M I I I I I I I I I I I I I I I I I I I I I I t I I I I I t I I I I I I I I I I I I I I I I 
DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 



300 



300 



360 



FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 
lllllllillllllllllltl:{lilllllllllllltlll:|llllllllllllllllt 
FVESVRFLGEREQQBCAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 360 

KPS I SARLVLTKVFDE IGE PQKAEAH 386 
nil 11111:11111 :: llllt: 

KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 405 



orflOOng 

The complete length ORFlOOng nucleotide sequence <SEQ ED 755> is: 



60 



65 



1 

51 
101 
151 
201 
251 
301 
351 
401 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 
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4 51 CCGGAAAAAC AGCAGCTTTC CCGCTATCTT CTGCTGGCGG AATCGGCGTT 

501 AAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCC 

601 TTCGATCGGG GCGATGCGTT GCAGGTTCTG GCAAAAaccG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGATGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGagcGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAMT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATTCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGCCGGCTC GCCTACGGCC GCAAACTTTG GGGTAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG TATTGCACTG AAGCCGAGTA TTCCGGCGCG 

1101 TTTGGTGTTG GCAAAGGTTT TTGACGAAAC CGCACAGTCG CAAAAAGCCG 

1151 AAGCACAGCG CAACTTGGTT TTGGCAAGCG TTGCCGGGGA AAACCGCCCT 

1201 TCCGCCGAAA CCCGTTGA 

This encodes a protein having amino acid sequence <SEQ ID 756>: 

1 MKTWWIWL FAAAVGIALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE NMRRSGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQMA DAAD/^LKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSIPARLVL AKVFDETAQS QKAEAQRNLV LASVAGENRP 

401 SAETR* 

ORFlOOng and OKFlOO-1 show 95.3% identity in 402 aa overlap: 



10 20 30 40 50 60 

orf 100-1 . pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 

tniMtllllllllMIIIIIIIIIIIIIIIIIIIIIMIIIIilllllMllliilll 

orflOOng MKTVVWIVVLFAAAVGIJU^SGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100-1 . pep FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
I M I I I I I M : I: I I I I I 1 t I I [ i I I I I I I I I I I I I I t M I I I t I i I I I I I I I I I i M 
orflOOng FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100-1 . pep TLAIJ4LGAHAAGQMENIEIJIDRYLAEIAKLPEKQQLSRYLLIAESALNRRDYEAAEAN^^ 
11 1 1 I I I I I ) I I I I I I I I I i I I M I I I I M I I I t I I I I I M I i I I I M I I I I I I t t I I I I 
orflOOng TLALMLGAH7VAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 100-1 . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
I I 1 I I [ I I I I I I I I I [ I I I I I I t I I I I I I I I I I I I I I I I ! I I I I I I I I I I i I I t I I I I : I 
orf 1 0 Ong AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 100-1 . pep DT^AAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I i I I I 1 I I I I I I I i I I I I t t i I I I I I [ i I I t I I ) i I I I I I I I I t t i I i I I I I I I I [ I 1 I I 
O r f 1 0 Ong DAADAAALKTCLKRI PDSLKNGE LS VS VAEKYERLGLY ADAVKWVKQH Y PHNRRPELLEA 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 100-1 . pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
M i I M I I I I I I I I M I I I I I : M I t I I I I I I I I I M t I I I 1 I M I I I t M i I 1 I I t I I I 
orflOOng FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 



370 380 390 400 

orflOO-1. pep KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

MM MMMMMI :: MMMMMI :|: ::l M 
orflOOn KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETRX 
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370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from Kmeningitidis and 
Kgonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 90 

The following DNA sequence, beUeved to be complete, was identified in Kmeningitidis <SEQ ID 
757> 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORP102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRttjSP L GFGAWFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP 1484 hypothetical integral membrane protein of K vvlori (accession number AE000647) 
ORF102 and HP1484 show 33% aa identity in 143aa overlap: 

Orfl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK—KLYSFIASPAM 65 

orfl02 63 GAWFGAAIPFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L ++LLAY YC +R + + R+Y 

HP1484 66 GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 125 

orfl02 120 RVFNEIPXXXXXXXXXXXXFKPF 142 

RVFNE P ' KPF 

HP1484 126 RVFNEAPTILMILIVILVWKPF 148 
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Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF102 shows 99,3% identity over a 142aa overlap with an ORF (ORF102a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

orf 102 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I t t i I i I I I I 1 I I I t I I I I I I I I I I I I I I i I I I i I I I I I I t I I I It I I I I M I I I I I I I I 
or f 1 0 2 a MMFSWFKLFHLFFVI S WFAGL FYLPRI FVNMAMI DVPRGN PE YVRLSGMAVRLYRFMS PL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
M I I I I I I I t I I I I t I I t I ) i I I I I I I I I I I I I I I I i t I I I t I I I I 1 i I I I I I I I I I i I I 
orf 102a GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102 . pep VFNEIPVLLMVAALYXWFKPFX 
I I I M I I I I I I I I I I I t I i I I I 
orf 102a VFNEIPVLLMVAALYLWFKPFX 

130 140 

The complete length ORF102a nucleotide sequence <SEQ ID 761> is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCT^CT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS 6WVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102a and ORF102-1 show complete identity in 142 aa overlap: 

10 20 30 40 50 60 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I t I I I I I I i M I I I t M I I I t ) I I t I M It I I I I It I I I i I I t I I I I I t I I I I I I I I I I t 
MMFSWFKLFHLFFV I SW FAGLFYLPRI FVNMAMI DVPRGN PE YVRLSGMAVRLYRFMS PL 
10 20 30 40 50 60 

70 80 90 100 110 120 

GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I t t 1 I t I t I I i t II t t II t II t t I II II t I II t 1 1 I I t I i t I M I i I t I II II t I I t 1 t I 
GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
70 80 90 100 110 120 

130 140 
VFNE I PVLLMVAALYL WFK P FX 
I II I 1 I I 1 1 t 1 I I I t 1 I It t t I I 
VFNEIPVLLMVAALYLWFKPFX 
130 140 

Homology with a predicted ORF from N.2onorrhoeae 

ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from K 
gonorrhoeae: 



orf 102a. pep 
orfl02-l 

orf 102a. pep 
orfl02-l 

orf 102a. pep 
orfl02-l 
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orf 102 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 60 

llltllllllllllllllMltlMIIIMIMMilltllllllltllillMIIIIII 
orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 60 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

I t I I I i t I I I i I i I I i I I I I I I I I I I M I I I I I t I [ I i I I I I I I i I i I I t I I t I I I t I I 
orfl02ng GFGAWFGAAIPFAAGRWGSGWHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

orf 102. pep VFNEIPVLLMVAALYXWFKPF 142 

I I I I t I i M I I I i I I I I t I I I 
orfl02ng VFNEIPVLLMVAALYLWFKPF 142 

The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGCGCC GCGCGGCAAT CCCGAGTATG TGCGCCTGTC GGGGATGGCG 

151 GTGCGGTTGT ACCGTTTTAT GTCGCCTTTG GGTTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCcggccg GTGGGGCagc ggctggGTTC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTATCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAAcg aAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 764>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDAPRGN PEYVRLSGMA 
51 VRLYRFMS PL GFGAWFGAA IPFAAG RWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102-1 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
lllllilllllilllllillilllllllll|[|lt:|lllll(t[lllllllllliMII 
orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 

10 , 20 30 40 50 60 

70 80 90 100 110 120 

orf 102-1 . pep GFGAVVFG/WVIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I t I M i I I t I I I I I 1 I n I I I I t t I I I I I t i M 1 I I I I M I t I I I I I I I I I t 1 I I t I I 
orfl02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102-1 .pep VFNEIPVLLMVAALYLWFKPFX 
1 I I t I M I I I I ) I I t I t t I I M t 
orfl02ng VFNEIPVLLMVAALYLWFKPFX 

130 140 

In addition, ORF102ng shows significant homology to a membrane protein firom H. pylori: 

gi 1 2314 656 (7^000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori) Length = 148 
Score ==79.2 bits (192), Expect = le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps = 13/147 (8%) 



Query: 


3 


FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPLGF 


62 






F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 




Sbjct: 


8 


FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK— KLYSFIASPAM 


65 


Query: 


63 


GAWFGAAI P FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQD YSNAFS 


115 






G + + F +G GW+H KL L ++LLAY YC +R + + 




Sbjct: 


66 


GFTLITGILMLLIEPTLFKSG GWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRN 


121 


Query: 


116 


HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 








R+YRVFNE P KPF 




Sbjct: 


122 


ARFYRVFNEAPTILMILIVILVWKPF 148 
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Based on this analysis, it is predicted that these proteins &om Kmeningitidis and Kgonorrhoeae^ 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 765>: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

451 CCGCGCCGAT AA 

This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 

1 MAKMMKWAAV AAVAAAA VWG GWS.LKPEPH VLDITETVRR G 

51 

101 

151 

201 I SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the fiirther partial nucleotide sequence <SEQ ID 767>: 

1 ..GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTTUU^A AGGGCGATTT GATTGCGGAA ATCAATTCGA 

101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA TUVACGCGACT TCCAAAGAGG 

251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 

401 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 

451 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 

501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 

551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 

601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 

651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 

701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 

751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 

801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 

851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 

901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 

951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 

This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

1 ..VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR ^AALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.menin^tidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
an ORF (ORF85a) from strain A ofK meningitidis: 

10 20 30 40 

orf 85 . pep MAKMMKWAAVAAVAAAAVWGGWS-LKPEPHVLDITETVRRG 
I I I I I I I 1 I I I t I I I t I I I I I i I Mill:: lllfillt 
orf 85a MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITETVRRGDISRTVSATGEISPSNLVS 
10 20 30 40 50 60 

// 

80 90 100 

orf 85 . pep ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

I I I I t I t I M I M I I I I It I I I I I I I I I M 
orf 85a TIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 
210 220 230 240 250 260 

110 120 130 140 150 160 

orf 85 . pep GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGK 
ttlilltlllllllltlllllMMIttlltlllltllMltlllllltllllllllll: 
orf 85a GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGR 
270 280 290 300 310 320 

170 180 190 200 210 220 

orf 85. pep AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
M I M n i I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I M I i I I I i I f I I I I I I I I I 
o r f 8 5a AFVRVLGADGKAAERE IRTGMRDSMNTE VKSGLKEGDKWI SE ITAAEQQE SGERALGGP 

330 340 350 360 370 380 

230 

orf 85. pep PRRX 
till 

orf85a PRRX 
390 

The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 



1 


ATGGCAZUy\A 


51 


GGTTTGGGGC 


101 


TTACGGAAAC 


151 


GGGGAGATTT 


201 


GCAGATTAAG 


251 


ATTTGATTGC 


301 


GAAAAATCCA 


351 


TGCATTGGGC 


401 


AGGATGATGC 


451 


GCCGCCGCCA 


501 


CAAAATTTCC 


551 


CCGCAACGAT 


601 


ACTGTGAACG 


651 


GGATATGATG 


701 


TGAAGGCGGG 


751 


CCGATTAAGG 


801 


GTCGGGCGGC 


851 


ATTATGCCCG 


901 


ATGACGACGC 


951 


TATTCCGTCG 


1001 


TGTTGGGTGC 


1051 


AGAGACAGTA 


1101 


AGTGGTCATC 


1151 


GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
GGTCAGGCGC 
CGCCGTCCAA 
AAACTTTATG 
GGAAATCAAT 
AATTGGA/^C 
AGCGCGGAGA 
GACCGCTAAA 
AAGCCAATGT 
ATCAATACCG 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
AGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGACATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCTCGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGCTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCGGCGGAAC 
AGTAAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAGCCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTT 
AGACCAATAC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCACA 
AAGGCTCTAA 
ATTGGGCTAC 
TTCTCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAGGGCG 
GCGAAATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



CGGCGGCAGC 
GCTGCTTATA 
TTCTGCAACA 
AGGCATCGGG 
AAAAAGGGCG 
GCTCAATACG 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
TCAGACAGAG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTACT 
CGCCACGGGG 
ATGTGCTGAT 
TTTGTGCGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 770>: 



1 MAKMMKWAAV AAVAAA AVW6 GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLOT 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA TVLWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDl SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 
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351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf 85a . pep PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I I i i I i M M I I I I I I I I I I I I t I I I I I 
orf 85-1 VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 

90 100 110 120 130 140 

orf 85a . pep INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

illlllllll||MIIIIIIIIII)IIIIIMIIIinilllllM::ll:ltlllllll 
orf 85-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 

150 160 170 180 190 200 

orf 85a . pep ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
I : n I I I I I I i I i I I I I I I I I M I I I 1 I I i I I I I I I I I I I I i I i I I I I t I I t I t I I I I I I 
orf 85-1 AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 

210 220 230 240 250 260 

orf 85a . pep PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I I I I I I I I M I i I I I I I I I M I I t I I I I I I I I I i t I [ M I I I i I [ I I I I I I I n I I I t t I 
orf 85-1 PTIVQLANLDMMLNKMQIAEGDXTKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

orf 85a . pep GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
llllllllllMlllllllllllinilllllllllllMMMIilllllillMIIII 
orf 85-1 GGYNSSTDTASNAVYYYARSEVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
220 230 240 250 260 270 

330 340 350 360 370 380 

orf 85a . pep RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
: I I I I I I t t I I i I I t I I I I I I I I I I t ( I I I I I I I t M I I I t I t t I i t I I I I I M I I I I I 1 
orf 85-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



390 

orf 85a. pep PPRRX 
I 1 I I I 

orf85-l PPRRX 

Figure 19D shows plots of hydrophiUcity, antigenic index, and AMPHI regions for ORF85a.. 
Homology with a predicted ORF from N. gonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N.gonorrhoeae: 

ORF85 1 MAKMMKWAAVAAVAAAAVWGGWS.LKPEPHVLDITETVRRG 40 

I M I I I I i I I I t I I I t I I I I I I I nil!:: I ) I : I M I 
ORF85ng 1 M7VKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 50 



ORF85 

0RF8 5ng 

ORF85 

0RF85ng 

ORF85 

ORF85ng 

ORF85 

ORF85ng 



201 



ISFTILSEPDT 

I M I I I I I I I i 

TVNAAQST PT I VQLANLDMMLNKMQIAEGDITKVKAGQDI S FT I LSEPDT 



251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 
M I I I i I [ t I I I i I I I t I I I I I I I M I I I I I i I I I I I t I i I I i I I I I I I I 
251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 



301 



250 



250 



300 



300 



350 



MTTQNT VE I DGVKN VL 1 1 PS LTVKN RGGKAFVRVLGADGKAAERE I RTGM 
I I 1 t I I 1 i I 1 i I I I I I : I I I I I I I t I i I 1 I I I I I I I I I I M M M I i I I 
301 MTTQNTVEIDGVKNVLLIPSLTVKNRGGKAFVRVLGADGKAVEREIRTGM 350 

152 RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 

: I I I I I I I I I I I I M t I I i t i I I I I I I I I I I I t I I I I i I I I I 
351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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The complete length ORF85ng nucleotide sequence <SEQ ID 771 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGGCAAAAA 
GGTTTGGGGC 
TTACGGAaac 
GgcgAGATTT 
GCAGATTAAA 
ATTTGATTGC 
GAAAAATCCA 
TGCATTGGGC 
AGGATGATGC 
GCCGCCGCCA 
CAAAATTTCC 
CCGCGACGAT 
ACTGTGAACG 
GGATATGATG 
TGAAGGCGGG 
CCGATTAAGG 
GTCGGGCGGC 
ATTATGCCCG 
ATGACGACGC 
TATTCCGTCG 
TGTTGGGTGC 
AAAGACAGTA 
AGTGGTCATC 
GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
ggTCAGGCGC 
CGCCGTCCAA 
AAGCTTTATG 
GGAAATCAAT 
AATTGGAAAC 
AGCGCGGAGA 
GACCTCTAAA 
AAGCCAATGT 
ATCAATACCG 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
GGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGATATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCACGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGTTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCAGTGGAAC 
AGTGAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAACCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTC 
AGACCAACAC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCGCA 
AAGGCTTTAA 
TTTGGGCTAC 
TTCCCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAAGGCG 
GCG7\AATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



>is: 

CGGCGGCaac 
GCTGCTTATA 
TTCCGCGACG 
AGGCTTCGGG 
AAAAAGGGCG 
GATCGATATG 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
TCAGACAGAG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTATT 
CGCCACGGGG 
ATGTGTTGCT 
TTCGTACGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 772>: 



1 MAKMMKWAAV AAVAAAAVWG 



51 GEISPSNLVS 

101 EKSKLETYQA 

151 AAAKANVAEL 

201 TVNAAQSTPT 

251 PIKAKLDSVD 

301 MTTQNTVEID 

351 KDSMNTEVKS 



VGAQASGQIK 
KLVSAQIALG 
KALIRQSKIS 
IVQLANLDMM 
PGLTTMSSGG 
GVECNVLLIPS 
GLKEGDKWI 



GWSYLKPEPQ 
KLYVKLGQQV 
SAEKKYKRQA 
INTAESDLGY 
LNKMQIAEGD 
YNSSTDTASN 
LTVKNRGGKA 
SEITAAEQQE 



AAYITEAyRR_ 

KKGDLIAEIN 

ALWKDDATSK 

TRITATMDGT 

ITKVKAGQDI 

AVYYYARSBV 

HVRVLGADGK 

SGERALGGPP 



GDISRTVSAT 
STTQTNTIDM 
EDLESAQDAL 
WAIPVEEGQ 
SFTILSEPDT 
PNPDGKLATG 
AVEREIRTGM 
RR* 



ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 



30 40 50 60 70 80 

orf85ng PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I I I I I M I I t t t I I I I t I I M f I t I M I 
or f 8 5-1 VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 

90 100 110 120 130 140 

orf85ng INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 

I I t I : I I I I : : [ I I I I I I t I I t I i 1 t I I M 1 1 I M t I I I I t t I M : : I I I I ) I I t I t I I 
orf85-l INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEBCKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 



150 160 170 180 190 200 

orf85ng ALAAAKANVAELKALIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 
|:|||Mlllllitlllltliiltllll:ilMlltllllillllt I i I t I I I I I t I I I 
orf85-l AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 



210 220 230 240 250 260 

orfSSng PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
lllllllltlMntlMllllllllllllllllllitlllMllllllillllllllll 
orf85-l PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

orf85ng GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 
I t I I It I t I I I t I I I I I I I t t It I I I I I I I I I I M t I i I I i I I I M I I : I I 1 t I t I I I I I 
orf85-l GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
220 230 240 250 260 270 



330 340 350 360 370 380 
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orf 85ng . . KAEVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEIT7VAEQQESGERALGG 
I I I I I I I I I I M i : I I I I t M I : I I I M I I I I I i I I t I t I I t I I I I I I i I I I I i I t i [ 1) 
orf 85-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



orf 85ng 
orf85-l 



390 
PPRRX 
Mill 
PPRRX 



10 In addition, ORF85ng shows significant homology to an E.coli membrane fiision protein: 

gi 1 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIG0 SW; P43505 (412 aa) [Escherichia 
coli] Length 380 
Score = 193 bits (485), Expect = 2e-48 
15 Identities = 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 



20 



25 



Query: 


29 


Sbjct: 


41 


Query: 


89 


Sbjct: 


101 


Query: 


149 


Sbjct: 


161 


Query: 


209 


Sbjct: 


221 


Query: 


269 


Sbjct: 


274 


Query: 


329 


Sbjct: 


329 



P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

PVPT YQTLI VRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLS VAIGDKVKKDQLLGV 100 



1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 



I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 



30 p 1+ i1a++ ml k q++e d+ +k gq ft+l +p t + ++ vp 

:ltladmsamlvkaqvseadvihlkpgqecawftvlgdpltryegqikdvlp 273 

rnsstdtasnavyyyarsfvpnpdgklatgmttqntve i dgvknvlli psltvknrgg 32 8 
+ + ++a++yyar vpnp+g l mt q +++ vbcnvl ip + + g 
35 sbjct: 274 tpekvndaifyytuifevpnpngllrldmtaqvhiqltdvknvltiplsalgdpvg 328 

^v-lgadgkavereirtgmkdsmntevksglkegdkwise 372 

+v l +g+ ere+ g ++ + e+ gl+ gd+wi e 
tykvkllrngetrerevtigarndtdveivkgleagdewige 373 

40 Based on this analysis, it was predicted that the proteins from Kmeningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fixsion protein. Purified GST-fiision protein 
45 was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
siffface-exposed protein, and that it is a useful immunogen. 



Example 92 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 773>: 

50 1 ..ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

401 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; ORF120>: 



1 . . IP^TMTFERS GKAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 

1 MMKTFKNIFS AAILSAALPC AYAA GLPQSA VLHYSGSYGI PA2MTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGi KITNGKKLYS 

151 VGGLNKAGTG ECYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORF from N.meninsitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from strain A of N. 



meningitidis: 



10 20 30 

orf 120 . pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 

I I I I : It I I i I i I 1 I I I I i I I I I 

orf 120a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 
10 20 30 40 50 60 



40 50 60 70 80 90 

orf 120 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 
I I I i I M M I 1 M I I I I I t I I I I i I I I I I I I 1 I I t I I I t : i I I I 1 I I I I I I I i i t 

orf 120a SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 80 90 100 110 120 



100 110 120 130 140 150 

orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
tlillllllllltllllllltlllllllllillllltlltllllillliillMlllllt 
orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 180 
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160 170 180 

orf 120 . pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
[ i t I t I I I I I I i i I I I t I i j I I I i I I t I I t I I I I I 
orf 120a SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

The complete length ORF 120a nucleotide sequence <SEQ ID 777> is: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 



10 20 30 40 50 60 

orf 120a . pep MMKTFKNIFSAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIK 
I I t M t I I I i I I I i I M I I I I I I I I I i I I I I t t I I I I I I t I I : II I I I I I I i 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 120a . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
I I I M I I I I t i I [ I I I I I t I I I I I I I I I I M I I I I I I t I I I t I I t I I I : I I I I t I 

orf 120-1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 120a . pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
I I I I I I I i I I I I I M i I M I t I I I I I I I t t I I I I M I t I i I M I I I I I I I I I I I i I I t I I 
orfl20-l DLFTLAWQLAANDAKLPPGLKITNGKKLYS VGGLNKAGTGKYS I GGVETE WKYRVRRGD 

130 140 150 160 170 180 



190 200 210 220 

orf 120a . pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
I I I I [ I I I I I I I I I I I I I 1 I t I [ I I I I I i t I I I I M i t I I I I i I 
orf 120-1 DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 



Homology with a predicted ORF from K^onorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
gonorrhoeae: 



orf 120 . pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

I I I I M I I I I I i I I I I I t t i I I I I I t I I I I 

or f 1 2 Ong SAAILS AALPCAYAARLPQSAVLH YSGS YGI PATMTFERSGNAYKIVST I KVPL YNIRFE 6 9 

orf 120 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 90 

i I I I I I I I i 1 I t : I i : i I I I M I I I I i t I I I I I I I i I t I t t I I t I i I I ) I I I I I I t I I I I 

orfl20ng S GGTWGNTLH PAY YKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 129 



wo 99/24578 



-434- 



PCT/IB98/01665 



orfl20.pep 
orfl20ng 
orfl20.pep 
orfl20ng 



AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
I I t I I I I M i I I I I I I t I t I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I M I : I I I II I 
AANDAKLPPGLKITNGKKLYSVGGLNiCAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 

SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 
I II I t I I I I I I I I i I I I I I I N II I II I I t I I II 
SLNNIPAQIGYTDDGECTYTLKLKSVQINGQAAKP 223 



150 



189 



The complete length 0RF12Qng nucleotide sequence <SEQ ID 779> is: 



10 



15 



20 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTACA 
TTTCGAATCC 
ATAAAGACAT 
GGCAGCGTAA 
CAAGGCTATG 
CGAAACTCCC 
GTCGGCGGCC 
GGAAACCGAA 
CGTATTTCTT 
ACCGACGACG 
CGGACAGGCC 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 
CCCGGGTCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



TATATTTTCC 
CAAGGCTACC 
CCCGCCACGA 
GACGATTAAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGGCGAG 
CGCTTGCCTG 
AAAATCACCA 
GGGTACGGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAG 
AA 



GCCGCCATTT 
CCAATCCGCC 
TGACATTTGA 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
AGCAAAACCG 
GCAGTTGGCG 
ACGGCAAAAA 
AAATACAGCA 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



TGTCCGCCGC 
GTGCTGCACT 
ACGCAGGGGC 
ACAATATCCG 
CCTGCCTACT 
ATTCGCCGAC 
AGCAAAGCCC 
GCAAATGACG 
ACTTTATTCC 
TaggCGGCGT 
GATACGGTAA 
AATCGGCTAT 
TGCAGATCAA 



25 



This encodes a protein having amino acid sequence <SEQ ID 780>: 

1 MMBCTFKNIFS AAILSAALPC AYAA RLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

30 In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
I II II I II I I I I I t II I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I t I II I I I I I II 
Orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 120-1 . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
I It I I I II I I I I I II i I I M I : I I : I I 1 11 I I I II I I II I II 11 I II I 1 I I I II II t I II 
orfl20ng VPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120-1 . pep DLFTIAWQIAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 
t I I i t t I II M I I 1 I I t I I I I I I t I I I I I M I I t I I I I I I I I I I t t I I I I I I I I t I I t t I 
orfl20ng DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
130 140 150 160 170 180 

190 200 210 220 

orf 120-1 . pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
1:1 t t I I t I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I i I I 
orfl20ng DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 



35 



40 



45 



50 



This analysis, including the presence of a putative leader sequence in the gonococcal protein 
55 suggests that the proteins from N, meningitidis and ^gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 78 1>: 
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1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 .GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATT. . 

This corresponds to the amino acid sequence <SEQ ID 782; 0RF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 
101 IGEWQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNI.. 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; 0RF121-1>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DT LTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 
201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LVG LDSGFAI GMLAGILVFV 
251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PIAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninmtidis (strain A) 

0RF121 shows 98.7% identity over a 156aa overlap with an ORF (0RF121a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I t I I I I n i I I f II II I I I t i I t t I II I I I I I I I I I I II I I I I II I i I I I II II t I I 
orf 12 la MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
10 20 30 40 50 60 



orfl21 .pep 



70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
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I I I I t I I I i M I I I I t I I I t I I I M I I M I i f M I t M I I I I I i I i [ I I M I I I I I I I I I 
orfl21a ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
70 80 90 100 110 120 



130 140 150 

orf 121 . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNI 

1 1 1 1 M 1 1 1 1 1 1 [ 1 1 1 i 1 1 1 n I I I I I I I I I I I i I I 

orfl21a EI DQAS I lAWLQAHTGELSNALKAWFPVLMRQGGNIVSS IGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 



orf 121a SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

The complete length 0RF121 a nucleotide sequence <SEQ ID 785> is: 



1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG 

551 TTGCCAAACT GGTTCCGAGG CGTTTTGCCG 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT 

651 GATGCTGATT ATGGGTTTGG TTTACGGCTT 

701 TGGATTCGGG GTTTGCAATC GGTATGGTTG 

751 CCCTATTTGG GCGCGTTTAC AGGACTGCTG 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCTT 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG 

1051 AGTTTTTACC GGGGCAGGTA G 

This encodes a protein having amino acid sequence <SEQ ID 786>: 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DT LTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLIALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQ LLVMLI MGLVYGLGLV LV GLDSGFAI GMVA GILVFV 

251 PYLGAFTGLL lA TVAALLQF GSWN GILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

0RF121a and 0RF121-1 show99.2% identity in 356 aa overlap: 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGGTTGGTG 
CCGGTATTTT 
CTGGCAACCG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
TGCAGAAATA 



ATGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTTTTTGTT 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



10 20 30 40 50 60 

orf 121a .pep MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I I I I I I I I I I I I I I I I I M I I I I I i I I I i t I i I I I I I I t I I I t I I I i I I I I I I I I I I I 
orf 121-1 MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 121a . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLECNTIGGYV 
lllltlllltltllllllltllllltlllllllliilMtMllltllllllllltllll 
orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 12 la . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
I I I M M I 1 I I I I t I I I I I n I I I I I I i I i I t I I I I I I t I M I I I I I t I I t I I M M I I i 
orf 121-1 EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 



orfl21a.pep 



190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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I I I I I I I It t I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I M M I t I I [ I I I 
orf 121-1 SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 121a . pep GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
I I : I I i I I I I I I I I I I I i I I M i I I I I I I I ! f I I I i I : I I I I I I I t I M I I I I I I I I I I I 
orf 121-1 GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 



310 320 330 340 350 

orf 121a . pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
t t I I i I t I I I I I I I M I I I I I I I I I I t I I I I I I I I I t I I I i I I I I I t I I I M I I I I I 
or fl 2 1 - 1 DRIGLS PFWVI FSLMAFGQLMGFVGMLAGLPLAAVTLVLLRE GVQKYFAGSFYRGRX 

310 320 330 340 350 



Homoloev with a predicted ORF firom N.sonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (0RF121ng) from 
Kgonorrhoeae: 



orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

i I I I I I I I I I t i I I I I I I I I I I I I I: I t t I I I I I I I I I I I I I I I It i i I M I I M t I I i 
orfl21ng MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

orf 121 . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

I I I I I t I I I I I II i I I I I II 1 I I I I I t I I I t 1 I I It It 1 It I I I II I I I II I I 1 II I ( I I 
orfl21ng ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

orf 121 . pep EXDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNI 156 

I I t 1 t I I I I I : I I 1 I t I I I I II I I t I t I I 1 : I I I I I 
orfl21ng EIDQASIIAWFQAHTGELSNALKAWFPVLtMCQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 

An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 



Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 



1 ATGTATCGGA GAAAAGGACG 

51 GGCGTTTGCC GCCTTGGTCT 

101 CTCCGTTTGC GGTTGCGGCG 

151 GAATGGTTGC AGAAAAAGGG 

201 GATGGTGTTT TCCTTGATTT 

251 CTATGCTGGT CGGGCAGTTC 

301 ATCGGTTTTA TGCAGAACAC 

351 CGGATATGTG GAAATCGATC 

401 ATACGGGCGA GTTGAGCAAC 

451 AAACAGGGCG GCAATATTGT 

501 CTTGCTGCTT TACTATTTCC 

551 TCGCCAAACT GGTTCCGAGG 

601 GGTAATTTGA ACGAGGTATT 

651 GATGCTGATT ATGGGCTTGG 

701 TGGATTCGGG ATTTGCCATC 

751 CCCTATTTGG GTGCGTTTAC 

801 GCTCCAGTTC GGTTCGTGGA 

851 CCGTCGGTCA GTTTCTCGAA 

901 GACCGTATCG GCCTGTCGCC 

951 CGGAGAGCTG ATGGGCTTTG 

1001 CCGTAACCTT GGTCTTGCTT 

1051 AGTTTTTACC GGGGCAGGTA 



GGGCATCAAG CCGTGGATGG GTGCCGGCGC 
GGCTGGTTTA CGCGCTCGGC GATACTTTGA 
GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 
TTTGAACCGT GCATCCGCTT CGATGTCTGT 
TGTTGTTGGC ATTATTGTTG ATTATTGTCC 
AATAATTTGG CATCTCGCCT GCCCCAATTA 
GCTGCTGCCG TGGTTGAAAA ATACAATCGG 
AGGCATCTAT TATTGCGTGG TTTCAGGCGC 
GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 
CAGCAGTATC GGCAACCTGC TGCTGCCGCC 
TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 
CGTTTTGCCG GTGCTTATAC GCGCATTACG 
GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 
TTTACGGTTT GGGATTGATG CTAGTCGGAC 
GGTATGGTTG CCGGTATTTT GGTGTTTGTC 
GGGATTGCTG CTTGCCACTG TTGCAGCCTT 
ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 
AGTTTTTTCA TTACGCCGAA 7VATTGTAGGA 
GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 
TCGGAATGTT GGCCGGATTG CCTTTGGCCG 
CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 
G 



wo 99/24578 



-438- 



PCT/IB98/01665 



This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DT LTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGEMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KC?GGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVA GILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 



ORF121ng-l and 0RF121-1 show 97.5% identity in 356 aa overly: 



orf 121-1. pep 
orfl21ng-l 

orfl21-l.pep 
orfl21ng-l 

orf 121-1. pep 
orfl21ng-l 

orfl21-l.pep 
orfl21ng-l 



10 20 30 40 50 60 

MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
lllillll)IIIMtllllllllltl:|tllllllilll|||||||tttllllltlllil 
MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
i t I I t I I I t I M I I I I I I t 1 I t I I ) t I I I I I I M M I t I I I I I I I t i I i I t i M I I I I I t 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKMTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

EIDQASIIAWLQAHTGELSKALBCAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
IMIIIIIil:MlljMIIIIIIIItltt:llltllllltllM M I I I I I I I I I I I I 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 

130 140 150 160 170 180 

190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
M I I I I I I I I I I M I I M I I I I I i I 1 I I I I I I I i I I I I I t t t t M I M I : I I M I I I I I I 
SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 121-1 . pep GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 
i I : I I t I I I i I I I i I I i I I i M I I I I I I I t I I I M I I : I i I M [ I t I I M I I I I t I I M I 
orfl21ng-l GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

orf 121-1. pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
lttllllll)IMIIIIl:lllllllllilllllltllMlii:|[illItllliM 
orfl21ng-l DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 

310 320 330 340 350 

In addition, ORF121ng-l shows homology to a permease from HAnfluenzae: 

sp|P43969|PERM_HAEIN PUTATIVE PERMEASE PERM HQMOLOG Length - 349 
Score = 69.9 bits (168), Expect = 2e-ll 

Identities = 67/317 (21%), Positives = 120/317 (37%), Gaps = 7/317 (2%) 

Query: 26 VYALGDTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 lYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE-IDQASIIAWFQAHTGELSNALK 143 
ML Q +L S LP + N WL N Y E ID + + + F + ++ + 

92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 147 

144 AWFPVLMKQGGNIVSSIGNXXXXXXXXXXXXXDWQRWSCGIAKLVPRRFAGAYTRITGNL 203 



Sbjct: 
Query: 



Sbjct: 



+ + + N+VS D G+++ +P+ A+ R + 

148 SAVKLSLASIMNLVSLGIYAFLVPLMMFFMLKDKSELLQGVSRFLPKNRNLAFXRWK-EM 206 



Query: 204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 
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Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+ ++LP +1 S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYIIIAFAVSQLLDGNLLVPYLFSEAVNLHPLIIIISVLIFGGLWGF 326 

Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAIPLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 94 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 791>: 

1 . .ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 
51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 
101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 
151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 
201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 
251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 
301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 
351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 
401 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 
451 GAGCT^GCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 
501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG. . 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 



1 ..TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE lAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 



1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 



1 ISYWASSSPD FLEVDTAPLI FLPLLPFCASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

101 LRLYAFHPPE lAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from Kmeninsitidis rstrain A) 

ORF122 shows 94.0% identity over a 1 82aa overly with an ORF (ORF122a) from strain A of A^. 
meningitidis: 

10 20 30 

orf 122 .pep TAFSAALRLSPSXLVIFLSFGKPYQQTAAI 

111111:111 i : I I I i I I I I 1 I 1 I i I I t 
orf 122a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 
30 40 50 60 70 80 

40 50 60 70 80 90 

orf 122 . pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
nil llllllll Mlllllllllll 111:11111111 I I II t I I I II t I I I I I II I 
orf 122a LTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAFXVDARNVYAQIGGDVGTHLR 

90 100 110 120 130 140 



100 110 120 130 140 150 

orf 122 . pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 
I : I I I I II I M I II i II I 1 II I II t I II II I I I I t I II II I I I t I I I II ) I I I M I It I 
orf 122a NMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 



160 170 180 

orf 122 . pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
I i I I I t I II I I II i II II I I I I I I I I I I I I I I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDXVALSDTDVRHRLCSX 
210 220 230 240 250 

The complete length ORF 122a nucleotide sequence <SEQ ED 795> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATATCATATT 
GCCTTTGATT 
TGGTCGAACC 
ACTGCNTTTT 
TTTGTCCTTT 
TTNNNACGTC 
CTGCGACTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCGCG 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTGCCGC 
GGTACCGATG 
CGGCGGCGAT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GANGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 
TCGGTAACGG 
TTTTTCAAAT 
TCAGTTGGTG 
TGTGTTCCTG 



TTCACTGGAT 
TCTTACCCAA 
CCGATGTATT 
GCGCTTGAGT 
ATCAACAAAC 
CGTTCAAATC 
TGCGCCCGAG 
CACGAAATGT 
AATATGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAAG 
GGCTTCGATG 
CGTTTTCGGG 
TCGTCTTGTG 
AGCCGCCATC 
CTTACCAGCA 
ATAACCGAGT 
CTATGCCCAA 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
CGCATCGGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TAGATACCGC 
AAAAAGTTGA 
TACGAATTCG 
TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCGGAGTGTC 
TATCAGCTTT 
TACGGATGTT 



This encodes a protein having amino acid sequence <SEQ ID 796>: 



1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 



ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



10 20 30 40 50 60 

orf 122a . pep ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
I II I t I I I I I II I I I I I I I II I t I I I II t I I I I tl I I II I : I II I t I II t I I I I I I I I I 
orf 122-1 ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 122a . pep SSCWIFLSFGKPYQQTAAILTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAF 
I 1 I I I I II t t I I I I t I I I I I I II I llllllll I I I II I I I I M I I 111:11111111 
orf 122-1 SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
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-70 



80 



90- 



100 



110 



120 



130 140 150 160 170 180 

orf 122a . pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
lltlllllltltllilllliHItlif llltllltlltllllllltlllilllltllil 
orf 122-1 DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122a . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
lllllllllllllllllllllllllllliillllllMllltlllllltlltllllilll 
orf 122-1 FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122a .pep DIVALSDTDVRHRLCSX 
I I I I I It I I I I i I I I t I 
orf 122-1 DIVALSDTDVRHRLCSX 

250 

Homology with a predicted ORF from N.2onorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 
N.gonorrhoeae: 



orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 



TAFSAALRLSPSXLVIFLSFGKPYQQTAAI 30 
1111)1:111 I : I I I I I I I I M I I I I i I 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 80 

LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 90 
I I It I i I Mill M I I I I I I I I I I I II I I I 1 II II II M : II li : : li I I I I I II H 
LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 140 

NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 
III t i I I I t I I I I II I : II I I t I I I M I I I I I I I I I II I I I i I I I I : I I II : I t I I I I 
NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 
11111111111:11 : I I I I II t I I II II I I 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 



The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtccttt 
TTTGCACGtC 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 



1 MSYRASSSPD 
51 TAFSAAMRLS 



FLEVETAPLI 
SSCWIFLSF 



101 LRLYAFHPPE 

151 NHGRIDIDHL 

201 EQRVGNGVQQ 

251 RHRLCS* 



lAEFFVGFAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSWPP 
IGGDVGTHLR 
FELCGGVGPCM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NVRCEFGFLC 
AADVAQTCRT 
DIVALSDTDI 
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ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 

10 20 30 40 50 60 

orf 122-1 . pep ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
:M llltlltill:lllltllll)llllltlllltlllll:IIIIIIIMilllMIM 
orfl22ng MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122-1 . pep SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
lllltlllltllltlllltllMIMI Mill litlliMliMIIMltlllMIII 
orfl22ng SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEJAEFFVGFAF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122-1 . pep DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
):|tll: :|ltllllttlllll I I I I I I I I M I I t I : I t I I I I I I I I I I I I I I I I I I I 
orfl22ng DIDARNIDTQIGGDVGTHLRNVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122-1 . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
Ilttlll|:|lli:llllllllli)tlll)l:lt : I I I I I I I t I i I I I I t I I t i I I I I 
orfl22ng FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122-1. pep DIVALSDTDVRHRLCSX 
I I I t t i 1 I I : i I I i I I t 
orfl22ng DIVALSDTDIRHRLCSX 

250 

Based on this analysis, it is predicted that the proteins Scorn Kmeningitidis and N.gonorrhoeae, and 
their epitopes, could be usefiil antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 799>: 



1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGGGGCGGA JTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 

1 ..AGASANNISA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 

51 MGGFDCRLFR LETA* 



Further work revealed the complete nucleotide sequence <SEQ ID 801>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 
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801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 



1 MSGNASSPSS. SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGA SANNISARFA E TPVAVGVTL 

301 IGTVLAVM LP VTEYEN FLLL IGSVFAPMAA VLIA DFFVLK RREEIEGFDF 

351 AGLVLWLAGF ILYRFLL 5SG WESSIGLT AP VMSAVAIATV SVRLFF KKTQ 

401 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of//. 
meningitidis: 

10 20 30 

orfl25,pep AGASANNISARFAETPVAVSVTLIGTVLAV 

I I : I I I I 1 I I : : : 1 I : M : I : : : I I : I I I 
orfl25a KILLGAGLGAAGILA\rVLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAWGTLLAV 
250 260 270 280 290 300 

40 50 60 

orf 125 . pep MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 

: t I M I I I i i M i I M I I I I I: 
orf 125a LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 

310 320 330 340 

The ORF125a partial nucleotide sequence <SEQ ID 803> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGTCGGGCA 
TTGGTTCGGC 
TTGCGCCTTT 
GCCGTCGGCG 
CGGACNCANC 
CAGTGCTGTT 
GTGATGATTT 
GTGGGACGGC 
TTGTGCTGTG 
GTTTCGATGC 
NTTTTCCACG 
TCGGAACGGC 
CTGGCCGCCG 
GACGGCAACG 
GTTTGGCAGC 
CTGGGCGCAG 
CGTTACCACC 
ATATTTCCGC 
GTCGGCACAC 
CCTGCTGCTT 
CCGACTTTTT 



ATGCCTCCTC 
GCGGCGGTAT 
GGGCTGGCAG 
GCGCGCTGTT 
TCGATGGAAA 
TTCCGTGGCG 
ACGCCGGCGC 
GAATCTTTTG 
GCTGGTTTTC 
TGCTGATGCT 
GCAGGCAGCA 
AGTCGAGCTG 
ACTACACGCG 
CTCGCCTACA 
GGCGTTGTTC 
GTTTGGGTGC 
ACTTTTCTCG 
CAAACTTTCG 
TGCTTGCCGT 
ATCGGCTCGG 
CGTCTTGAAA 



TCNTTCATCT 
CGATTGCCGA 
CGCGGTCTGG 
TTTTGCGGCG 
GCGTGCGCCT 
AATATGCTGC 
AACGGTCAGC 
TCTGGTGGGC 
GGCGCACGCA 
GTTGGCGGTT 
CCGCCGCANN 
TCCGCCGTNA 
CCACGCGCGC 
CGCTGACCGG 
ACCGGAGAAA 
GGCAGGCATT 
ATGCNTACTC 
GAAATACCNA 
CCTCCTGCCC 
TATTTGCGCC 
CGGCGTGAGG 



TCCGCCGCCA 
AATCAGCACG 
CNGCTCTGCT 
GCGTATATCG 
GTCGTTCGGC 
AACTGGCCGG 
TCCGCTTTGG 
ATTGGCAAAC 
AAACAGGCGG 
CTGTGGCTGA 
GGTNNCAGAC 
TGCCGCTTTC 
CGCCCGTTTG 
CTGCTGGATG 
CCGACGTGGC 
TTGGCGGTCG 
CGCCGGCGTA 
TCGCCGTTGC 
GTTACCGAAT 
GATGGCGGCG 
AGATTGAAGG 



TCGGGCTGAT 
GGTACACTGC 
TTTGGGTCAT 
GCGCACTGAC 
AAACGCGGTT 
CTGGACGGCG 
GCAAAGTGTT 
GGCGCGCTGA 
GCTGAAAACC 
GTGCCGAANT 
GGCATGAGTT 
TTGGCTGCCG 
CGGCAACCCT 
TATGCCTTGG 
AAAAATCCTG 
TCCTGTCGAC 
AGTGCGAACA 
CGTCGCCGTT 
ATGAAAACTT 
GTTTTGATTG 
C. . 



This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 



1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
51 AVGGALFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGVfTA 
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101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGV SANNISAKLS E IPIAVAVAV 

301 VGTLLAVLLP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEG. . 



ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



10 20 30 40 50 60 

orf 125a . pep MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I i I I I I I M : I I I I t I I I I I I t I I t I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I t 
orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125a . pep AYIGALTGXXSMESVRLSFGKRGSVLFSVANMLQLAGVfTAVMIYAGATVSSALGKVLWDG 
I I I I I I I I I I I I It I I I I I I I I I I I I M I I I I I M I M t t I I I t I t I I I I I I 1 I I I I I 
orf 125-1 AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 125a . pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 
I M I I I I I I I t M I I I I I I I I I I I I t I I I I I i M I M i I t t I i I t I I i I I I I I I I I t 
orf 125-1 ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 
130 140 150 160 170 180 

190 200 210 220 230 240 

or f 125a . pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 
I I I M t I I I I i I I I I I I I I I I I I I I I i I I I I I i I I I f t I I I t I I t I I I I I I I I t I I I I I I 
orf 125-1 GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a . pep TGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAV 
I t I I I I I M I I I I I I i I I I I I I I I I I I I I I I I I I t t t I I : I I I I I I I ::: I I : I I : I : : 
orfl25-l TGET DVAK I LLGAGLGAAG I LAWLSTVTTT FLDAYS AGASANN I S ARFAET PVAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

orf 125a. pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
: I I : I I I : i I I I M I I t I I I t I I I I I I I I I I I I I M i I I I I i I I I I I 
orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 

310 320 330 340 350 360 

Homology with a predicted ORF from N.£onorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoeae: 



30 



308 



orf 125 . pep AGASANNISARFAETPVAVSVTLIGTVLAV 

I I I I I I I I I I I I I I I t I I : i I I I I t I I I 
orfl25ng KILLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVTLIRTVLAV 

orf 125 . pep MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 

I I M I I I : I I I I I I t I I : I I I t I I I t I I 1:11 
orfl25ng MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 

An ORF125ng nucleotide sequence <SEQ H) 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 



1 

51 
101 
151 
201 
251 
301 



MSGMASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



AVGG ALFFAA AYIGALTGRS 
VMIYVGATVS SALGKVLWDG 
VS MLLMLLAV LWLSVEVF7V S 
PLAADYTRQA RRPFAATLTA 
LLGAGLGITG ILAWL STVT 
LIRTVLAVML PVTEYKNFLL 



SMESVRLSFG KCGSVLFSVA 
ES FVWWALAN GALIVLWLV F 
SGTNAAPAVS DGMTFGTAVE 
TLAYTLTGCW MYALGLAAAL 
TTFLDTYSAG ASANNISARF 
LIRSVFGPMA GGFDCRLFCL 



NMLQLAGWTA 
GARRTGGLECT 
LSAVMPLSWL 
FTGETDVAKI 
AE IPVAVGVT 
KTA* 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 



1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 

401 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 

501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 

551 CCTTCGGAAC GGCAGTCGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 

601 CCGCTGGCCG CCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 

651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 

701 TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGATVAATC 

751 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

801 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 

851 ACAACATTTC CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 

901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 



1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 

301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLIA DFFVL KRREEIEGFD 

351 FAGLVLWLAG FILYRFLL SS GWESSIGLTA PVMSAVAIAT VSVRLFF KKT 

401 QSLQRNPS* 

ORF125ng-l and ORF125-1 show 95.1% identity in 408 aa overly: 



10 20 30 40 50 60 

or f 125-1. pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
Mlttlllllt:tlli:||l|[[litltlillitlMlllllt]IMIIIIIMjtll)l 
orfl25ng-l MSGNASSPSSSAAIGLVWFGAAVSIAEISTGTLLAPLGWQRGLAZU,LLGHAVGGALFFAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 125-1 . pep AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
i I i I I t I I I I f I I M I I t 1 I I I I I I j I I I I I I i I I I I I I I I I I : i I I i I I I i I I I I I I I 
orfl25ng-l AYIGALTGRSSMESVRLSFGKCGSVLFSVANMLQLAGWTAVMIYVGATVSSALGKVLWDG 

70 80 90 100 110 120 



130 140 150 160 170 179 

orf 125-1 . pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I M I I I M I I I I I [ I I I I I I I t : I I I I I I t I i I I M I I I I I I I : I I I : :: I : : I I II 
orfl25ng-l ESFVWWALANGAL I VLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEV FAS SGTNAAPAVS 

130 140 150 160 170 180 



180 190 200 210 220 230 239 

orf 125-1 . pep DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
I i I : I I I I I I I t I I I t I I 1 I 1 I I I I I I ] : t I I 1 I t I I I I I t I I I ! I I I I t i M I I I t t I I 
orfl25ng-l DGMTFGTAVELSAVMPLSWLPLAADYTRQTmRPFAATLTATLAYTLTGCWMYALGLAAAL 

190 200 210 220 230 240 



240 250 260 270 280 290 299 

orf 125-1 . pep FTGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGASANNISARFAETPVAVGVT 
I I I I I I I M I I t M H I : I I I I I I I I i I I I M I I : I i i i I I I I I I I I I I I I t I I I [ I I 
orfl25ng-l FTGETDVAKI LLGAGLGITG I LAWLSTVTTTFLDTYSAGAS ANN I SARFAEIPVAVGVT 

250 260 270 280 290 300 
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300 310 320 330 340 350 359 

orf 125-1 . pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
I t I I I I I i 1 I I I I 1 i : I I I I I t I t I M I t I I M I I I I I I M t I M I 1 I I I I I I I I I I I I I 
orfl25ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

310 320 330 340 350 360 

360 370 380 390 400 

orf 125-1 . pep FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 
I I I I I I t I It I t I I I i I I I I I I I ) I I I I I I 1 M t t i M I I I I I I I I I I [ 
orfl25ng-l FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N, meningitidis and 
N.gonorrhoeaey and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 96 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A.ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC.CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG. . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 81 1>; 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 

701 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 
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This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGF MIS PAVTAAAARL AV7VLF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF126 shows 90.0% identity over a I80aa overly with an ORF (ORF126a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 
I I I I i I I I I I I I I I I I M I M I i I I I 1 I I I I f : I I I I I I I I I I I t M I I I I ! I : I I I I I 
orfl26a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 126 . pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 
llllllll tMllltlt:l:l :li i I I I I I I I I I I i I I I I : I t I I [ I I I I t :it I 
orf 126a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 126 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 
I I I I I I I I I I I I i I I I I i I llllllll llllll: llllllllllllllllllll:li 
orf 126a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF126a nucleotide sequence <SEQ ID 813> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GCAGANCATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGGCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 
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251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKXA 
351 PERDEESGLA YIRRQD* 

ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 



10 



15 



10 20 30 40 50 60 

or f 12 6a . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCEIRGEHAAAYVAAAMLAPAAEAVEATP 
I I i I I I I I I I 1 I I I ) I I I t I M I t t 1 I I I M I t I I I I I I t I I I I i I I I t I I I I I I t I t t I 
orf 126-1 MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCEIRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 126a . pep EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
t I I M I I I I I I t I I I I I : I : I : I I I I I t I t I 1 I I I ] I I t I : t I I i i I t t ! I 1 I t I I 
orf 126-1 EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 12 6a . pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 
I I I I i I i I i I I i t I M 1 I M I I t I I I I i I I I I I M I I I M I I M I I 11 I I I i I 11 1 I: 1 1 
20 orf 12 6-1 VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 



25 



190 200 210 220 230 240 

orf 126a. pep DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
I i I M I I I I I j I M i I I I I I I I I I I I I I I I I M I I I I [ t i I I f I I I I I I M I I I I I I 
orf 126-1 GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
190 200 210 220 230 240 



30 



35 



40 



250 260 270 280 290 300 

or f 12 6a . pep LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADI LEIATGLRPT 
M I I I I I I M I t I I 1 I I i I It I I t t [ I I I I M I I I I I I I : I I I t It I I I I f I M I I M t 
orf 126-1 LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12 6a. pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 
I i 1 I I I It 1 1 I t I t It 1 1 I t I I I II M I I I I It I 1 I I : I I i I It I I II 11111:11111 
orf 126-1 LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 

310 320 330 340 350 360 



orf 12 6a . pep YIRRQDX 
I I I I I I I 

orf 12 6-1 YIRRQDX 

45 

Homology with a predicted ORF from N.^onorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N.gonorrhoeae: 



or f 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 
50 I I I It : t II t I It I I t I 1 I I I I 1 I I 1 I nil: 1 : I t I i I f i 1 t 1 I t I 1 I I I : I I I 1 I 

orf 12 6ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 126. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 
II : I I It It t It I I 1 II i I I II I I I II I I I II I i I I II II I I I I I I I I I I I I I : I t I I 
55 orfl26ng EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 120 

or f 12 6 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 180 

I I I II I : I II I M I i M I I IIMMM lllllt: I I I I II I I II I i I I I II I I I : t : 
orf 12 6ng VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 180 

60 An ORF 1 26ng nucleotide sequence <SEQ ID 8 1 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 



1 MTRIAVLGGG LSGRLTALQL AEOGYQISLF DKGTRQGEHA AAYVAAAMLA 
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51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAPeTA 

201 WNQSPEHTST LRGIRGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 81 7>: 

1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC GAAGTCATCA GGCTGGGCAG 

201 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG ATGAAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGF MIS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I I I I : I I I I I I It I I I t t i I I I I I M I I I t I I : I I H M t I I I I I I I I I I i i I I I I I 
MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 
10 20 30 40 50 60 

70 80 90 100 110 120 

EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
t I : I I i I I t I I i I I I I 11 I i I I M I I t I I M t I i I I i I I I It I I I t I I 1 I I t t I i I I t I 
E VI RLGRQS I PLWRG IRCRLNTLTMMQENG SLI VWHGQDKPLS SE FVRHLKRGGV7VDDE I 
70 80 90 100 110 120 

130 140 150 160 170 180 

VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
t I t t t I : t t M I t I t t i t t II t I t i I t t t I t I t t t t t II t t t t I t t t I t t I I I t I t t : I : 
VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 
130 140 150 160 170 180 

190 200 210 220 230 240 

GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
II II I I 1 : M I I 11 t I t I 1 I 1 I 1 1 1 t 1 I I I t 1 1 1 I 1 I 1 1 1 1 1 1 1 1 1 I I II t M I It t II 
DLQAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
190 200 210 220 230 240 



orf 126-1 .pep 
orfl26ng-l 

orf 126-1. pep 
orfl26ng-l 

orf 126-1. pep 
orf 126ng-l 

orf 126-1. pep 
orf 126ng-l 
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250 260 270 280 290 300 

orf 12 6-1 . pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
IIIIIMItMlllllllltlllll|[iltitIlllllti:|lllltlllllll:IMI[ 
orfl26ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 12 6-1 . pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 
1111111111:1 tlllilltllll[llll)illlll:|lllltltllllllll:ltiM 
orfl26ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 

orf 126-1 .pep YIRRQDX 
I i I I I I 

orf 12 6ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 



gi 12627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli] 
Length - 327 
Score = 169 bits (423), Expect = 3e-41 

Identities = 112/329 (34%), Positives = 163/329 (49%), Gaps = 25/329 (7%) 



Query: 


3 


RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 


62 






RI V G G++G A QL G+++ L ++ G 




Sbjct: 


2 


RILVNGAGVAGLTVAWQLYRHGFRVTLAERAGTVGA-GASGFAGGMLAPWCERESAEEPV 


60 


Query: 


63 


IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 


122 






+ LGR + W + G+L+V G+D F R G DE+ 




Sbjct: 


61 


LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS-GWEWLDEVA- 


113 


Query: 


123 


WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 


182 






lA EP L GRF ++ E LD RQ L+ALA L++ + + 




Sb j ct : 


114 


lAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDTVRMRLTLG WGES 


165 


Query: 


183 


QAQY DWV I DCRGYGAKTAWNQS PEHTSTLRG I RGE VARV YTPE ITLNRPVRLLHPRY PL Y 


242 






+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 




Sbjct: 


166 


DVDHDRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 


218 


Query: 


243 


lAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 


302 






I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 




Sbjct: 


219 


IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 


278 


Query: 


303 


PfflNPEIRYSRERRLIEINGLFRHGFMISP 331 








+ P R ++E R + +NGL+RHGF+++P 




Sbjct: 


279 


DNLP— RVTQEGRTLHVNGLYRHGFLLAP 305 





This analysis suggests that the proteins from N.meningitidis and N.gonorrhoeae^ and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 



The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ED 
819>: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG. .GCTT TAGACAGTAA ATTCATGTTG 

301 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 

351 TGAAAATCTA GTAACCTTTA aTTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

401 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 

451 GTAG 
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This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 
101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 
151 * 

Fxirfher work revealed the following DNA sequence <SEQ E) 821>: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAi\AGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTG/\ATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 



1 MTDNRGFT LV ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results; 



Homology with a predicted ORF from N.menin^tidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A ofN, 
meningitidis: 

10 20 30 40 50 60 

orf 127 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I t I ) t I I I 1 I I I I I i I I t I I I I I I I I I M I I 1 I I It I I I i: I I I 1 [ I I I I I I I I i i I I I I 
orf 127a MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127 . pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKN PFIIKMNENL 
I I M t I I I I M I I I n I I I I I M M I I I II II M I I t I I I I I I t I I I I t I I I I I I I I I 
orf 127a GRFKQTSTKW PS LP IKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKN PFIIKMNENL 

70 80 90 100 110 



130 140 150 

VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I I I I I i I I I t I t I I I I M I I I I I I II I I I I I 
VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF127a nucleotide sequence <SEQ ID 823> is: 



orf 127 .pep 
orfl27a 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 



1 

51 
101 



MTDNRGFT LV ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 
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ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 

10 20 30 40 50 60 

orf 127a . pep MTDKRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

llltlllMIIIIIIIIMIIIIIItllllllltlilMi:llll Illllllll 

orf 127-1 MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127a . pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
llllllllltlllllMlllltllllltlillllllllMtlllllttllllMlltIi) 
orf 127-1 GRFKQTSTKWPSLPIBCEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 



130 140 150 

orf 127a . pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
llllllllilllltlllllllllllltlii 
orf 127-1 TFICKKSASSCSDGLDYFKGNDKDCECLLKX 

130 140 150 



Homology with a predicted ORF from Ksonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
Kgonorrhoeae: 

orf 127 .pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 60 

I I i I I I [ M t I I M I M I I I I i I I t I I I [ I I I I I I I I I 1 I I I I i i : I I I M t i I I I I I I I 
orfl27ng MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEBCAKINAVRAAFLENAHFMEKFYLQN 60 



orf 127 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 

[lilt Ill tlllllllli II tlllllllMMtlllllllllllllll 

orfl27ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127 . pep VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 

I I I I I II I M I M I I I t I I I I I I I I I I I I 
orfl27ng VTFICKKSASSCSDRLDYFKGNDKDCKLLK 149 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 



1 ATGACTGATA ATCGGGGGTT 

51 GATATTGTCT GTACTTGCTT 

101 TTGAGAAAGC AAAGATAAAT 

151 CATTTTATGG AAAAGTTTTA 

201 TACCAAATGG CCAAGTTTGC 

251 GTTTGAATGG AATCGCGCGC 

301 GCGGTAGCCA TAGATAAAGA 

351 AAATCTAGTA ACCTTTATTT 

401 GGCTGGATTA TTTTAAAGGA 

This encodes a protein having amino acic 



TACACTGGTT GAATTAATAT CAGTGGTCTT 
TAATTGTTTA TCCGAGCTAT CGCAATTATG 
GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 
TCTGCAGAAT GGGAGATTTA AACAAACATC 
CGATTAAAGA GGCAGAAGGC TTTTGTATCC 
GGGGCTTTAG ACAGTAAATT CATGTTGAAG 
TAAAAATCCT TTTATTATTA AGATGAATGA 
GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 
AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

sequence <SEQ ID 826>: 



1 MTDNRGFT LV ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

orf 127-1 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVR/^LLENAHFMEKFYLQN 
t I I I II t I I I i I It I I I I I M I I I M t I I I I II t M I I I I I II I t I I i I II I I I I t I I I I 
orfl27ng-l MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127-1 . pep GRFKQTSTKW PS LP IKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNP FIIKMNENLV 
IIIIMIIilllllllllllltlllllllltillllllllllllllllltlliltlllll 
orfl27ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 
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130 140 150 

orf 127-1 . pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
M I I I I I I I I M I I I i I I I I I I I I I I I i I 1 
orfl27ng-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 

This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N.meningitidis and 
Kgonorrhoeae^ and their epitopes, could be usefixl antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 98 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 827> 

1 . .GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 
51 CAACCAAATG CGGAAAACCg GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 
101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 
151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 
201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 
251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 
301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 
351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 
401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 
451 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 
501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 
551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 
601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 
651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 
701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 

1 . . VSLASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 
51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 
101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 
151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 
201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
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1201 TTTCCGGAAA CCGTCCTGAC . CCTCGGCGAC . TCGCACGCCG GACACCTGAG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAA7VAG CCGAAGCCGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCTGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTCCA CAAACACGAA CGCCTGCTTA AATCTTCCCA 

1851 CGGCGGCGCA TTGCAGTAG 

This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 



1 MQAVRYRPS I DGLRAVAVLS VMIFHLN NRW LPGGFL GVDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN X SIILFLILTA SSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNP FIPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFMYITGDK QL GLPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 

551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKSSHGGA LQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical integral membrane protein HI0392 oiH.influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity m 180aa overlap: 

Orfl28: 1 VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 

++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
HI0392: 46 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 



Orfl28: 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYICKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLS 165 

Orfl28: 121 TIJIFPEL LAG SLIAVYGQTQNGRRQTANGKRQLLSSLC FGALLACLFV I DKHNP FX PGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ -f FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A of ^; 



meningitidis: 

10 20 30 

orf 128 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 

I i t I I I I i I I I I I I I M I I I I I I I M I I I I 
orf 128a ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 
60 70 80 90 100 110 



40 50 60 70 80 90 

orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
I I I I I I I I I t I I I i I I i I I I I I I I t I I I I I I I I t I I I I t I I I I i I I t M I I I t I I I I I I I 
orf 128a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 



orf 128. pep 



100 110 120 130 140 150 

ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
I I I I i I I I : I I I I I I i I I I I I I I I I I I I I I I t I I I I I I I i I t I I M I I I i I I I I I I I I I I 
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orfl28a 



ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
180 190 200 210 220 230 



160 170 180 190 200 210 

orf 128 . pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
IIMIIIIItllllliillMllllllllllllillllMllllllliMlllllllllt 
orf 128a RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 

220 230 240 

orf 128 . pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 
I I I I t t I I I I t I I I I I I ) I 1 I I t I I I I I t I 
orf 128a VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVA/VLTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 



orfl28a 



BCMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 



The complete length ORF128a nucleotide sequence <SEQ ID 83 1> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
GGCATCATTC 
TTATACCCGC 
CGCTGGCTTC 
CAAATGCGGA 
TCTGGGGTTT 
TACTGCATAT 
CCTCTTTTGC 
GCGTAACATC 
TGCCAAGCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATTGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATA 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGACGGCGCA 



TCCGATACAG 
GTCATGATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCC 
AAACCGTGGA 
CAGCAGGGGT 
CTGGTCTTTG 
TGATATTTTG 
AGCATCATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATCACTCTGC 
ACAATCCGTT 
GCACTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGTTG 
CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



ACCGGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTGCG 
ATTTCGATTT 
GCAGTAGAGG 
CTGCAAAAAA 
TATTTCTGAT 
GATATTCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCAT 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGTTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCGCGC 
CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCAGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTCTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACAAAATCGC 
TTTGACTGCC 
ACCAACCCAA 
GCAGGTTCGC 
AACAGCAAAT 
TGCTTGCCTG 
ATGACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCTTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTCT 
CGCAAGGGGG 
CCCTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
ACATCGTTTT 
TACTTATTAC 
TGCTGGCGGT 
GGA7UWVCGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATTACATTAC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATCAT 
GACACCTGCG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCTCG 



This encodes a protein having amino acid sequence <SEQ ID 832>: 



1 MQAVRYRPE I DGLRAVAVLS VMIFHLN NRW LPGGFL GVDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA TSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKR QLLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
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501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAflNQYL RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a .pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orf 128a. pep 
orfl28-l 



MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I t I M I I I I i I I I I I I I I I I M I t I I I I I I M I ) I [) I I I I I t I I I I M I I I I M I i I I I 
MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
I I I I I I I I I M I I I I I t I I I I I t M t I I I t I I I I I 1 I I I I I M t I I I I M I [ I I I I I I I I 
SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
I M I I t I I I t I I I I I I M I M I I I I I I I I I I I I I I t I I I I I i I I I 1 I I I I I I I I i I I [ I I 
QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
: I I M I M I I I I I I I I I I I I I t I I I I [ I It I I i I I I i I 1 I I I I 1 I I I t I I i I I I It I M I 
SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I I I I I I I t I I It II M I I t II t I I I I t I t I I I I t II I I I I I I t II II M I I I I t II 1 I 11 
FGALLACLFVI DKHN PFI PGMTLLLPCLLTALLIRSMQYGTLPTRI LSAS PI VFVGKI S Y 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
lllllllllllllltlllltlllllttllllllllltllltllltlllllllllltllll 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I 1 I t It I t It I 1 I 1 t II I 1 I I I 1 I I I I I I I I I I I It t II t I I i II I I I I I t t I I I I t I I I 
FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I I I I I I I I I I I 1 1 I M I I I t t I I I I I I I I 1 I I I I I M I I I I I I I I I t I I I I I I I I I I I t I 
DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 
I I I I I I I I I I I I t I t It I I I I t I i I I t I I I I I I I I It I I I I I I I I I I I I I i M t I I I I I I 
PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYI. 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I I I I I 1 I 1 I I I I I I t II I I t II 1 I I t t I I I I I I t I I f I t I I I 1 I I I I I I t I t I I f I I 1 1 I 
RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQBCYLPKNTVEIYGRYLYGDQDHLTYFGSY 

YMGREFHKHERLLKSSRDGALQX 
Itllllllllllllil: Mill 
YMGREFHKHERLLKSSHGGALQX 



Homology with a predicted ORF from N.mnorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from K 
gonorrhoeae: 



55 



60 



65 



orf 128 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 30 

I t I I I I I I I I I I I I I I 1 I I 1 I I I : I II : I I 

orfl28ng ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASOIFLYEDFNQMRKTIELSTVF 112 

orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 

I I I I I I I I : t I I I I I I I I I I I i I I I I I I I M I I t I II I I I I I I I I I I I 1 I I I I I It I t 

orfl28ng LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 

orf 128 .pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

I I I I I 1 1 t I I I 1 I : I t I I I I I 1 I 1 t 1 I I I I I I I I I I I I : I I It I I I t i I I I 1 I I I I fit 

orf 128ng ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

orf 128 . pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 

i I I 1 1 1 I I t t t I : I 1 I t 1 1 I I : I t It t : I I I I 1 II I I I 1 1 I I I I I I 1 t t 1 I 1 I I 1 t 1 II 

orfl28ng RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 292 
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orf 128 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 244 

I I i I I I I I I I I I M I I I I I I I I I I I I I I I I 
orfl28ng VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 352 

The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
AACATCATTC 
TTATACCCGC 
CCCTGGCTTC 
CAAATGAGGA 
TTTGGGGTTC 
TACTGCATAT 
CCTCTTTTGC 
GCGTAATATC 
TGCCGGCCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATCGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATG 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTGT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAATACATCA 
TTGCTATAAA 
AAGAGCAATC 
TTGGGTGGAC 
GACGCTATCT 
TATATGGGGC 
AGGCGGCGCA 



TCCGATACAG 
GTCATTATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCT 
AAACCATAGA 
CGATTGGGGT 
CTGGTCTTTG 
TGATATTCTG 
AGCATCATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATTACTCTGT 
ACGATCCGTT 
GCGCTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGCTG 
CCGTCTTGAC 
GATTATGTCG 
TTCGGAGTGT 
GCCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AGAGGATAGC 
ATCAGCCGTT 
CCAATACCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



GCCTGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTACG 
ATTTCGATTT 
GCGGTAGAGG 
TTACAAAAAA 
TGTTTCTGAT 
GACATCCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCat 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGCTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCACGC 
CCTCGGCGAC 
GCGGCAGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTGATACCCG 
CGCCGTCAAA 
CTCCCTTGAG 
CGGCCTATTC 
TGATTTGGTT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCGGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTTTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACCAAATCAC 
TTTGACCGCA 
ACCAACCcaa 
GTGGGTTCGC 
AACAGAAAAT 
tgCTTGTCTG 
ATAACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCCTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTTT 
TTCAAGAGGG 
CCGTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TGGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCAAAGC 
CCTGTATATG 
GGAGGAAAAA 
GGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTCA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
TCATCGTTTT 
TACTTATTAC 
TGTTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATTACATTAC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATAAT 
GACACCTGCG 
GCTAAAATCC 
GCTGGCAGAC 
CCGAAGCTGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGA/VAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATACACG 
CGGTTCTTAT 
AGCATTCCCG 



This encodes a protein having amino acid sequence <SEQ ID 834>: 



1 MQAVRYRPE I DGLRAVAVLS VIIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKR QLLSLLC FGALLVCLFV 

251 IDKHDP FIPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLM LVGYNLYSRG ILKQEHLRPL PGTPVAAENN 

401 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 



orf 128-1 . pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
[ I i I I 1 I I I I i I I i I I I I n I : I I I I ) I t ) I I I I I i I I I I I i I I I I I I I I : I I I I i [[ I I 
0rfl28ng MQAVRYRPEIDGLRAVAVLSVIIFHLNNRWLPGGFLGVDIFFVISGFLITNIILSEIQNG 

orf 128-1 . pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
lltlll)lliMlitlllll|||||||||lllilli|[llllllt:Mt:||||||ltll 
orfl28ng SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 
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QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
: I i I I M I I i I I I I I I I I t I I I I i I I I I I I I I I i I I I M I t I I I I i I t I I I M t I t I I 
RLGYFDLSADENPVLHIWSIAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

I I I t I : I I I I I ! I I I I I i'l i I I t 1 I I M i I : I i I I I I I i I I I I I I I I I I I I I I I I I II 
SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

II II I : M I I I II I : I M I 1 : It I I I I I II II t II I i I t t I II II II II i i II f I I I I II 
FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
I II I I II I II II I II II I I II I I I I i I I I I II I t I 11 I II I I II I II t II I I I I I I I I I I 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFECKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I II I II I 1 I : t I I I I I I : t I f I I i I I M I I I I : I : II I I : I I I I I I I II M II I I I I I I I 
FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I I I I : I I I II I I I I I I I I I I I I I II I I I I I I i I I I II I I I I I I I I i i I I t I I I I I I I I I I 
DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQS FLI PG FPAR FRETVKRI AAVKPVYVFANNTS I SRS PLREEKLKRFAANQYL 
1 1 1 1 1 1 1 1 1 1 1 r I I I I I I I t i I It I I I M I I I I I II I I I I It I I I I I t I I I I I I till 
PVPRFEAQS FLI PGFKAR FRETVKRI AAVKPVYVFANNTS I SRS PLREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
lli:tlllllltllllttl:llllttllMltllllttllll:ltllltlllltliltlt 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGREFHKHERLLKSSHGGALQX 
I I I I I I I 1 t I I I I 1 I : t I I I t 1 
YMGREFHKHERLLKHSRGGALQX 
610 620 

In addition, ORF218ng shows homology to a hypothetical HAnfluenzae protein: 

sp|P4 3993 1Y392_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi | 1074385 | pir | t B64007 
40 hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 

>giU573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%), Positives = 152/225 (67%), Gaps = 1/225 (0%) 





Query: 


38 


VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 


97 








+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 






Sbjct: 


1 


MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRXKRIYPAFITVMALVSFIASAIFIYN 


60 


50 


Query: 


98 


DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 


157 








DFN++RKTIEL+ FLSN YLG GYFDLSA+ENPVLHIWSLAVE Q I 






Sbjct: 


61 


DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 


120 


55 


Query: 


158 


YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 


217 






YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 






Sbjct: 


121 


YKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLSNLRFPELLVGSLLAI 


180 




Query : 


218 


YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVIDKHDPFIPGIT 262 




60 






Y N + Q +L++L L CLF+++ + FIPGIT 




Sbjct: 


181 


YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 





This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins firom Kmeningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



orfl28-l.pep 
^ orfl28ng 

orfl28-l.pep 
orfl28ng 

10 orfl28-l.pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 
orfl28-l.pep 

20 orfl28ng 

orf 128-1. pep 
orf 128ng 

25 

orf 128-1. pep 
orfl28ng 
30 orf 128-1. pep 

orfl28ng 
orf 128-1. pep 
orfl28ng 



35 
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Example 99 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 835>: 

1 ..ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 



1 . . IiyfTOWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 

51 VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 83 7>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 



1 MDFRFDIiyE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALAR LIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORE from N.meninsitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) firom strain A oiN. 
meningitidis: 



10 20 30 40 50 

orf 129 .pep IIYSYRWMFLYGALTTLGLT WAXAGGSVLGLLLALAR LIHLEKAGAPMRVLAW 
l)|[||lltllll)lltllltll:|lttllt)IIIIIIMMIIIIIIII]lll 
orf 129a MDFRFDIIYEYRWMFLYGALTTLGLT WATAGGSVLGLLLALAR LIHLEKAGAPMRVLAW 
10 20 30 40 50 60 



60 70 80 

or f 12 9 . pep ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV 
I I I I I t I I I I I I I I 1 I I I 11 I I M i I I i i I I I I I 
orf 129a ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV HPSDGILVSGEAAIALRRGYGP LIAG 
70 80 90 100 110 120 



orf 129a SLALIANSGAYIC EIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQAUIRMLPPLAS 
130 140 150 160 170 180 

The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGM TACCGCTGGA TGTTTCTTTA 
51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 
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101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>; 

1 MDFRFDIIYE YRWMFLYGAL TTLGL TWAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVAE L 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 



or f 12 9a . pep MDFRFDI I YEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I t M I M I I t I I t I I I I I I I I I i I i I I I 1 I I I I t I I I I M [ I I I I I I I I I I I t t I I I I i I 
orf 129-1 MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 



orf 12 9a . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
HllllllilMIMIiltllllltnilillMillllllltlMlllllllliltMl 
orf 129-1 ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 



orf 129a . pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
IMIIillinillltlllilllllllltlitlllltlllllillllllllMMitlll 
orf 129-1 SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 12 9a . pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
t i I i I I I M M I n i ) i I I I I I M I I I I I I I I I I I I I I I I N I I I M f I I I M t I I I I I I 
orf 129-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 



orf 129a . pep KRYNPQHRX 
Itlllilll 
orf 129-1 BCRYNPQHRX 



Homology with a predicted ORF from N.mnorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
N.gonorrhoeae: 

orf 129 . pep IIYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVLAW 54 

I I i I I I t M I I i I I I M I t I I t I: I I I I I I I I I I I I I I I I M t I I ! I I t I I M I 
orfl29ng MDFRFDI I YEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 



orf 129 . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 

1 1 1 1 1 1 1 1 n i I t I I I I I I I t t I I I I 1 i I I I I I I 
orfl29ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 

An ORF129ng nucleotide sequence <SEQ ID 841 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVILH TAF 

101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 



1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 
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51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGG/^ AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYG PLIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVAE L 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 



orf 129-1 .pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I t I I i I t ! t I I I I I I I I I I H I I I I M I I 1 I i I I I I I I I I I I I I I I M I I I I I I I I I I t I 
orfl29ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 



orf 129-1 .pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
i 1 I I I I I t I I I i I I t I I I I I i I I I I I I I I i I I I I t I I I I I I I I I I I I I I I I I i i I I I t I I 
orfl29ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAI2VLRRGYGPLIAG 

orf 129-1 . pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
I { I I t I t I I I I I I I t I I M I I I I I I [ I I I M I I [) I I I i I t I [ I I t I I I i I I I I I I I I I 
orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 129-1 .pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I I I I I t I I I I I I I I I I I I I I I I I I I t I I 1 t 1) M I I I I I I : I I I I I M I t I I I I I I I I I I 
orfl29ng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLE 

orf 129-1. pep KRYNPQHRX 
I i I i i I i I t 
orfl29ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus ] Length =224 
Score = 132 bits (329), Expect = 2e-30 

Identities - 86/178 (48%), Positives = 103/178 (57%), Gaps = 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI + I +F P+ GI + E A G +AL 

Sbjct: 58 ISTAYVEVIRGTPLLVQILI VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 lANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIESIPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be usefiil 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 845>: 

1 - . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 

51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 

101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 

151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGT7V7WVCA 

201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 

251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 

301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 

351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 

401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 

451 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 

501 TGACCGCCGG CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 

551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 

This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . LKECRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 

51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 

101 HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 

151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAA/U^CA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPCyT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLNMAAVM 

151 FVSVRVSILL GA EAXfCECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 A QTAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 

301 lAVPlLFAAA VSRAFLMN VN P IFFITVPAI LTAAVFVL YL FTFIPIFRAN. 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N.meninsitidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORE (ORF130a) from strain A ofN. 
meningitidis: 
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10 20 30 

orfl30.pep LKECRLKDPVFIPNIVYKNIAITFLLLHAA 

i 1 1 1 1 1 1 n I i I I I : I I I I I i I I I I I I I I I 

or f 1 3 Oa LNLLRAQVHLNMAAVMEVSVRVS I LLGAEALKECRLKDPVFI PNWYKN I AITFLLLHAA 

140 150 160 170 180 190 

40 50 60 70 80 90 

orf 130 . pep AELWLPAQTAGFTAIAVGFILIAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

lliltlllli)l|:| inilllltlllillllltlllllltllllll lllltl 

orf 130a AELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 
200 210 220 230 240 250 

100 110 120 130 140 150 

orf 130 . pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
tlll|]|)tiltlllll)lt:IMIIIMIIIIIIIIIIIMIIItlllllltllllil[ 
orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 



160 170 180 190 

FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 
I I I I I I I M I I t 11 M M I M I I :: t : I I i I I I I I I I I I f I 
VLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPEX 
320 330 340 350 

The complete length ORF130a nucleotide sequence <SEQ ID 849> is: 



orf 130. pep 



orfl30a 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGCGGCCGT 
GGTGTTTTTC 
TGGAACTTAT 
TTGGACTGGA 
GGCGGCATTA 
CTGCCTCGTT 
GCCCGGCTGA 
GTTACTTGCC 
ATTTGAACCT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATTAC 
GCGCAAACCG 
CAAGCTGCGT 
GCACTTATTA 
GGCGCGGCGA 
TACCCTCGGT 
GACTGTGGCA 
ATCGCCGTCC 
GAACGTAAAC 
CCGTGTTCGT 
GCGTTTACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCG 
CGGGTTTTTC 
TTGCTCGCCG 
TTTCGTCGCC 
TTTGGCTAGA 
GCGTTCACTG 
GTTGCGCGCG 
TGCGCGTCAG 
AAAGACCCAG 
CTTCCTGCTC 
CCGGTTTTAC 
GAGCTTCACC 
CCTGCTCCAA 
AATTACAAAA 
GGCATGATGG 
CAGCGGCTTT 
CCATCCTNTT 
CCGATATTCT 
GCTTTACCTG 
ACGATCCGGA 



CGCGGCGGTG 
GTGCCATCGT 
GCATACGGCG 
GGGTAACCTG 
CATCCGCTAT 
GCCTATTGGC 
CCGAAACACC 
TTTTTCAGAC 
CAAGTGCATC 
TATTCTTTTG 
TATTCATCCC 
CTGCACGCCG 
CTCGCTCGCC 
ATCACGAACT 
CTCTTTGCCG 
CCTGCCCGCC 
GCAGCGTGAT 
ACCAAGCTCG 
CGCCGCCGCC 
TCATCACCGT 
CTGACATTCG 
ATAA 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGTCG 
ACTGCCCTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCATATGCC 
TAAATATGGC 
GGCGCGGAAG 
CAATGTCGTC 
CCGCCGAACT 
GTCGGCTTTA 
CCTGCGCAAA 
CCGCAGGCTA 
TCCGCGCCCC 
GATGGTGTGG 
ACTACCCGAA 
GTTTCGCGCG 
CCCCGCAATT 
TACCGATCTT 



TCGGTGCGCT 
CAAATTTTCT 
TGCGGCTTTG 
CGACTTTGAT 
TCGCCGCAAA 
GCTGTTCTGC 
CCCTGCTAAT 
GTCAGCGGCG 
GGCGGTGATG 
CCCTGAAAGA 
TATAAAAACA 
TTGGCTGCCT 
TCCTGCTTGC 
CACTACGTCC 
TTTGTGGACA 
TGCACCTGAT 
CTGACTGCCG 
ACTCTGCCGC 
CTGTTTTAAT 
CTGACCGCCG 
TCGGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ED 850>: 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLW MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 A QTAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LT AGLWHSGF TKLDYPKLCR 

301 lAVPILFAAA VSRAVLMN VN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORFlSOa and ORF130-1 show 98.3% identity in 357 aa overlap: 



orf 130a. pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
llllllitlliMlllillilllllilllinilllltllMIMIIIIllllllllill 
orf 130-1 MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTTUVLLDWTGFSGNL 

orf 130a . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNT DNFALLMLLA 

ItlllllMIMIIMIMIIillllllllilllllllllMlltilinilllllllll 
orf 130-1 KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 



orf 130a. pep 



AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNW 
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I I I I t I I I I I I I I I I M I I I I I I I I I I t I t M I I I M I I I M I I I I I I I I i M I I I M : I 
orf 130-1 AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

orfl30a.pep YKN I AIT FLLLHAAAELWLPAQTAGFTSLAVGFI LLAKLRELHHHELLRKH YVRT Y YLLQ 

I I t M I I I M I I I I I I I I I 1 I I I I I I I : I I M I t I I I I I I I I I i I I I i I I I It I I I I I M 
orf 130-1 YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 



orf 130a. pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
M I I t I t I I I I i I M I I I I I t I I t I I I i I M I I I : I I I I I M 1 I I M 1 I I I t I I I I i i M 
orf 130-1 LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130a. pep lAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPS 
t I t I I I t I I I I I I I tlllllllllllllllllill[llt:ll:lllllltl]l)ll 
orf 130-1 lAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFrPIFRANAFTDDPE 

Homoloev with a predicted ORF from Ksonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORFlSOng) from 



LKECRLKDPVFIPNIVYKNIAITFLLLHAA 
I 1 I I I I t I I I I I I I :: I I I I I M I I t M I 
LNLLRAQVHLNMAAVMFVS VRVS VLLGTETLKECRLKD PV FI PNVI YKN I AIT- LLLHAA 



30 



201 



90 



N.gonorrhoeae: 

orf 130. pep 
orfl30ng 
orf 130. pep 
orfl30ng 
orf 130. pep 
orf 130ng 
orf 130. pep 
orfl30ng 

An ORFlSOng nucleotide sequence <SEQ ID 851> was predicted to encode a protein having amino 



AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 Mini 

AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 261 

LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 
I n I I I t I I I I I I 11 I I I M I M I I I I I I II t M I I I I I t I I I I I II I I I I I : ) II t I 
LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPE 193 

t I I I I I I I i I I I I I I I II : I I I :: I : I I ! I I II I I I I II 
VLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPE 364 



acid sequence <SEQ ID 852>: 



1 MNKFFTHP/IR PFFVGA AVLA ILGALVFFHQ PRRYHPAPPN FLGTYAAGCI 

51 RRFFDYRFVG PDGFFRQPET CRYFDG GWA CCGCFIAVFT ATC RIFRRRL 

101 LAGVAAVLRL ADLARRQHRT LRSVDVTAAF TVFQTAYAVS GDLNLLRAQV 

151 H LNMJ\AVMFV SVRVSVLL GT ETLKECRLKD P VFIPNVIYK NIAITLLL HA 

201 AAELWLPA QT AGFTALAVGF ILLAKLR ELH HHELLRKHYV RTYYLLQLFA 

251 AAGYLWTGAA KLQNLPASAP LHLITLGGMT GGVMMVWLTA GLWHSGFTKL 

301 DYPKLC RIAV SILFASAVSR AVLM NVNPIF FITVPE ILTA AVFMLYLLTF 

351 VPIFRANAfT DOPE* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 853>; 



1 ATGCGCCCGT TTTTCGTCGG TGCGGCAGTA CTTGCCATAC TCGGTGCGTT 

51 GGTGTTTTTT ATCAACCCCG GCGCTATCAT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCT GCATACGGCG GTTTTTTGAC TACCGCTTTG 

151 TTGGACCGGA CGGGTTTTTC AGGCAACCTG AAACCTGCCG CTACTTTGAT 

201 GGCGGTGTTG TTGCTTGTTG CGGCTGTTTT ATTGCCGTTT TTACCGCAAC 

251 TTGCCGCATT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCTGGCTGA TTTGGCTCGA CCGCAACACC GACAACTTCG CTCTGTTGAT 

351 GTTACTTGCC GCATTTACCG TTTTTCAGAC GGCCTATGCC GTCAGCGGCG 

401 ATTTGAACTT ACTGCGCGCG CAAGTGCATT TGAATATGGC GGCGGTCATG 

451 TTCGTATCCG TCCGCGTCAG CGTCCTTTTG GGCACGGAAA CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCCG TATTCATCCC CAACGTTATC TATAAAAACA 

551 TCGCCATCAC CCTGCTGCTG CACGCCGCCG CCGAACTTTG GCTGCCCGCG 

601 CAAACCGCCG GTTTTACTGC GCTTGCCGTC GGCTTCATCC TGCTCGCCAA 

651 GCTGCGCGAA CTGCACCATC ACGAACTCTT ACGCAAACAC TACGTCCGCA 

701 CTTATTACCT GCTCCAGCTC TTTGCCGCCG CAGGTTATCT GTGGACAGGC 

751 GCGGCGAAAC TGCAAAACCT GCCCGCCTCC GCGCCCCTGC ACCTGATTAC 

801 CCTCGGCGGC ATGACGGGTG GCGTGATGAT GGTGTGGCTG ACTGCCGGAC 

851 TGTGGCACAG CGGCTTTACC AAACTCGACT ACCCGAAACT CTGCCGCATC 
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901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT -TCGCGCGCTG TTTTAATGAA 
951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 
1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 tttacagacg atccggaata a 
This corresponds to the amino acid sequence <SEQ ID 854; 0RF13Qng-l>: 



1 MRP FFVGAAV LAILGALVFF IN PGAIILHR QIFLELMLPA AYGGFLTTAL 

51 LDRTGFSGNL KPA ATLMAVL LLVAAVLLPF L P QLAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V S6DLNLLRA QVHLNMAAVM 

151 FVSVRVSVLL GTETLKECRL KD PVFIPNVI YKNIAITLLL HAAAELWLPA 

201 QTAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL FAAAGYLWTG 

251 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLMN VNP IFFITVPE IL TAAVFMLYLL TFVPI FRANA 

351 FTDDPE* 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overly: 



MRP FFVGAAVLAILGALVFFIN PGAI VLHRQI FLELMLPAAYGGFLTAALLDWTG FSGNL 
I I [ I I t t t 1 I M I I I I I I I I i I i I I I : i I I I I I I I I I I M M I t i t t : I I I I I M I i I t 
MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
l|:||IM:lll:|:::|lt II I : i I II I I I I I I I II I 1 I I I I I M I I I I I I i t I I t 
KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 
I I II II I I I t I I I II M t I I I I I t I I I i II i M I I I I : I I I : I : I I I I II I I I I I I I i: : 
AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

YKNIAITFLLUiAAAELWLPAQTAGFTAIAVGFXLIAKLRELHHHELLRKHYVRTYYLLQ 
lllllll lllllllllllllllllllllllllltllllltllllllllllllllllltl 
YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
llllltlllllllllttlllinillllllll I 1 I I I II II I I I 1 I I I I I I II I II I II 
LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

lAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 
III I I I I : I I I I I II I i I t I I I t I II I I i II I I : I I I : I t : I I I I I II I I I I I. II 
lAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N.meningitidis and Kgonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 101 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 855>: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGG6TG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG. . 

This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 



1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG 

151 GGCGGCGAGA 

201 CGGCAATCGT 

251 ACTTTTACAG 

301 ACGCGTGACG 

351 CTGCTTGGAA 

401 GATGGTAA 



TAAGCCGAGA 
GTCCGCCGTC 
TCCGTCAGGG 
GAAAATAGGG 
GCAAACCTTT 
AAGCAGGGGT 



AAACCGGCTG 
TTTAGGGGAC 
CAAACGAATA 
AAGTTTGAAG 
GATTGAGACG 
TGCGGCGCAA 



CCATCGATTT 
TACGAGATAC 
TGAATCCGCA 
CCTGCGGGCT 
TTCAAACAGG 
CGGTCTGTCC 



TTGGGATATT 
CGCTTTCAGA 
CAACAATCTT 
GGATTGGCGT 
GAGGATTTGA 
GAGCGCGTCC 



This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmenin^tidis (strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORFl 3 la) from strain A of 
meningitidis: 

10 20 30 40 50 60 

orfl31.pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
I I I I I I I I I i t M I M I I I I I It t I I I I i I I H: I i M I I t I I I I I I I I I I I t I I I I I I 
orfl31a ^4EIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 131 . pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 
I I I I I t I I t I M I I I M I I I I I i I I I I M I I I I I I I i I I I I [ I I i I M I t I Mill: 
orf 131a YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 



orf 131. pep K 
I 

orf 131a KQGLRRNGLSERVRWX 
130 



The complete length ORF 13 la nucleotide sequence <SEQ ID 859> is: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 



1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

0RF131a and 0RF131-1 show 97.0% identity in 135 aa overlap: 



orf 131a. pep ME IRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDI GGESPPSLED 
I t I I I I t I I I t I I I II II I I I I M I I I 1 II I II : I I M t I II II M I I I) I I M [ I II I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 



orf 131a , pep YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
lillllllilllltlllllllltllllltlllllMMIIIIIIitMMIIt Mill: 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



orf 131a. pep 



KQGLRRNGLSERVRWX 
I I I I I I t I II I II I I t 
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orf 131-1 KQGLRRNGLSERVRWX 

Homology with a predicted ORF from K^onorrhoeae 

0RF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (ORFBlng) from 
5 N. gonorrhoeae: 

orf 131 . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

lllhlllli ill:|lllllilltllllt I I: i I I I I M I i I I I I I I M I I M II I 
orfl31ng MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

10 orf 131 . pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

111 II MM iiiiiiiiiihiiiiiiiiiii iiiiiiiiiiiit:! Ill null 

orfl31ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 



15 



orf 131. pep K 121 
I 

orfl31ng KQGLRRNGLSERVRW 134 

A complete length ORFlSlng nucleotide sequence <SEQ ID 861 > was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
20 51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 

101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

25 101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtccgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

30 351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; 0RF13lng-l>: 

1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
35 101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and 0RF131-1 show 92.6% identity in 135 aa overlap: 

orf 131ng-l . pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
1111:11111 lil:||llltlMltlll)MI:lllllllllMMIItlllll II I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131ng-l .pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
IIIIIIIItlllltlllltll:llllllllllli)litllltlMlt:l III HUM 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



40 



45 orfl31ng-l.pep KQGLRRNGLSERVRWX 

I I I t I I I II I I I t I 11 
orfl31-l KQGLRRNGLSERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from Kmeningitidis and N.gonorrhoeae, and their epitopes, could be 
50 useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 102 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 865> 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC.GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

751 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

401 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 MKHIHIIGIG GTFMGGLAAI AK EAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
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351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
401 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical o457 protein of Kcoli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 



Orfl32: 4 IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 

IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
o457: 3 IHILGICGTFMGGIAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 

Orfl32: 64 ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
o457: 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDEVLRDRWVLAVAGTHGKTTTAGMA 121 



0rfl32: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of M 
meningitidis: 



10 20 30 40 50 60 

orf 132 . pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 
illllillMllitlhltllllini I t I I M I I I t t I f I I I I I t I llltll:ll]l 
orf 132a MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 132 . pep EFKADVYVIGNVAKRGMDVVEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 
I I i I I I I I 1 I M If I M I I ) It I I I I I I I I 1 I I [ I I : I I 11(11 I I I I I I I M I I I 
orf 132a EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 

70 80 90 100 110 120 



130 140 150 160 

orf 132 . pep SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

I I I I I I t I I I I I t i I I t I I I : I I : I : I : : I : t I 

O r f 1 3 2 a SMLAWVLE YAGLAPGFX IGGVPENFS VSARL- PQT PRQDPNSQS PFFVIEADE YDTAFFD 

130 140 150 160 170 



170 180 190 200 210 220 

HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 
:||: :::| 

KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 

The complete length ORF 132a nucleotide sequence <SEQ ID 869> is: 



orf 132. pep 
orfl32a 



1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

401 CGGGCTTCNT TATCGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 
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801 CTCGTTCGAC GTGTTGCTTG ACGGCAAAAA AGCCGGACAC ' GTCGCTTGGA 

851 GTTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCNGT CATCGCCGCC 

901 GCGCGTCATG CCGGAGTNGA CATTCAGACG GCCTGCGAAG CCTTGAGCAC 

951 GTTTAA/U^C GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGTA 

1001 TCACCGTTTA CGACGACTTC GCCCACCATC CGACCGCTAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAGCG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAATACGA TGAAGCTGGG TACGATGAAA GCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGNTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 CAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This encodes a protein having amino acid sequence <SEQ ID 870>: 



1 MKHIHIIGIG GTFUGGIAAI AK EAGFEXSG CDAKMYPPMS TQLEALGIGV 

51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQTPRQDPN SQSPFEVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA ICyTQFHHLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAVIAA 

301 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

401 DWDVAEALAP LGGRLHVGKD FDAFVAEIVK NAEAGDHILV MSNGGFGGXH 

451 TKLLDALR* 



ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 



orf 132a. pep MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAICMYPPMSTQLEALGIGVYEGFDTAQLD 
I I t I I I I I It I I I I I I : i I I I I I I I i I I I M I I i I i I t I I ) I I I I I t i I I I I I: I I I t 
orfl32-l MKHI HI IGIGGT FMGGLAAI AKEAGFEVSGCDAKMYPPMS TQLEALGI DVYEGFDAAQLD 

orf 132a . pep EFKADVYVIGNVAKRGMDVVEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
I I I ) I I t I I I I i I I I I i I I I I I i t I llltlll[|)t:tl lllll till llllltll 
orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 



orf 132a . pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
I I I I t I I I t I I I II II I II I I I I I : I I I II I II II I I 11 M I I I M I II I I M t II I I t 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 



orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 
I I I I t I I I I I I I I II I I I I I II I I I I I I t I I I I I I I : I I I I I I II I I I I I I I I I I I I II I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 



orf 132a. pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
I II I I It II I I I I I I I II II I t t II I I I t t I II I I I II : I I : II I II I I I i I M I I t 
orfl32-l LDKGCWT PVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVI AA 



orf 1 32a . pep ARHAGVDIQTACEALSTFKNVKRRMEIKGTANGITVYDDFMHPTAIETTIQGLRQRVGG 
I II : I II [ I I II I I I :: M II I I I II II I I t ) I I II I t I I i I II I I I i I I II I I I II I II 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf 132a . pep ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
i i I I I I II I I I I I II I I II I : I I I : I I I I I I I II I I I I : I I I I I I I I I I I I I I i: I I I I 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 



orf 132a . pep FDAFVAEIVKNAEAGDHILVMSNGGFGGIHTKLLDALRX 
I I I I I I I I i I I II : I II I I It I I I I I I I I I 111:1111 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 



Homologv with a predicted ORF from K^onorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 



orf 132 . pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 

1 1 1 I I I t I I t I I 1 I I 1 : I I t I I I I I t : It 1 I t I I 1 I 1 I 1 I I I 1 I I I I t I : I I I I I I I I : 
orfl32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 



wo 99/24578 



-471- 



PCT/IB98/01665 



orfl32.pep EFKADVYVIGNVAKRGMDVVEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 120 

M: I I : I I I I I I I : I I i i I t I I I I i M M I i! I I M : I I t I t I I I t I I I I i I I 1 I I 1 I I 

orfl32ng EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 120 

orf 132 -pep SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 

I I [ I I I I I I i I I I I t I I M I I 111111111:1111 IMi t t I I I I I I I I I I I I I I I I 

orf 132ng SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 180 

orf 132 . pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 240 

I : I I I I M M I I t I I i i I I I I I I I i I I I i I I I I I h t : : I : [ ) I I i I I I I I I I 

orfl32ng tLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 240 



orf 132. pep FGQRLLDAGGKIRHGTRLA 259 

I I 1 I I I I I I M I I I I 1 I I 
orfl32ng FGQRLLDAGGKIRHRTRLADW 261 

An ORF132ng nucleotide sequence <SEQ ID 871> was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 



1 MKHIHIIGIG GTFMGGIAAI AK EAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG KVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGIR 

201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY fgqrlldagg 

251 KIRHRTRLAD W* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGAAACACA 
TGCCGCCATT 

agatgtatcc 
cacgaaggct 
cgtcatcggc 
tgaaccgtgg 

GTGCtgcacc 
gaccaCcGcg 
CGGGCTTCCT 
CTACCGCAAA 
CATCGAAGCC 
TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGC 
AGCAAAGCCT 
AAATTCGGCA 
CTCGTTCGAC 
ATTTGATGGG 
GCACGCCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAGCCGCGT 
CCGCAAGCCT 
GACTGGGACG 
CGGTAAAGAT 
CCGGCGACCA 
ACCAAACTGC 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
GCTGCCTTAT 
atcaTTGGgt 
tCCATGCTCG 
CATCGGCGGt 
CGCCGCGTCA 
GACGAATACG 
TCGCCCGCGT 
TCTTCGCCGA 
ACCGTACCAT 
GCAAGATACT 
CCGGACACGG 
GTATTGCTTG 
CGGACACAAC 
CCGGAGTCGA 
GTCAAACGCC 
CGACGATTTC 
TGCGCCAACG 
TCCAACACCA 
CAAAGAAGCC 
TTGCCGAAGC 
TTCGATACCT 
TATTTTGGTG 
TGGACGCTTT 



CGGTATCGGC 
CCGGGTTCAA 
ACCCAGCTCG 
GCAGTTGGAA 
GGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CCTGGGTCTT 
gtaccggaAA 
AGACCCGAAC 
ACACCGCCTT 
ACCGCCGTGT 
CTTGGGCGCG 
CCGAAGGCCT 
TTGGACAAAG 
CTGGCAGATT 
ACGGCAAAAA 
CGCATGAACG 
TGTTCAGACG 
GCATGGAAAT 
GCCCACCACC 
TGTCGGCGGC 
TGAAACTCGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTCAAG 
GGATGTGGTC 
CGCAATGGCT 
GcagggaCGC 
GGAATATGCC 
ATTTCGGCGT 
AGCAAATCGC 
TTTCGACAAA 
TGAACAATCT 
ATACAGACCC 
CATCGTCTGC 
GCTGCTGGAC 
GGTGAAGTCA 
AGCCGGACAC 
CGCTCGCCGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCCAT 
GCGCGCATCC 
CACGATGAAG 
TCTGCTACGC 
TTGGGCTGCA 
AATTGTGAAA 
GCGGTTTCGG 



TGGGCGGGAT 
TGCGACGCGA 
CATAGGCGTA 
CCGATATTTA 
GAGGCGATTT 
GGCTGAAAac 
ACGGcaaAac 
GGACTCGCGC 
TTCCGCCCGC 
CGTTTTTCGT 
CGCTCCAAAT 
GGAATTCGAC 
AGTTCCACCA 
AACGGACAGC 
GCCGGTGGAA 
ATGCCGACGG 
GTCGCATGGG 
CATCGCTGCC 
CCTTGGGTGC 
GCAAACGGCA 
CGAAACCACG 
TCGCCGTCCT 
TCCGCCCTGC 
CGGCGGCGCG 
GGCTGCGCGT 
AACGCCCGAA 
CGGAATACAC 



This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 



1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 

451 TKLLDALR* 
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ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 



orf 132ng-l .pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 
i I I I I I t I I I I t I t I I : I I t I I I I t I : I I I t I I I I I I I I I M I I I M I I : I I I t I I I I : 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 



10 



orf 132ng-l . pep EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 
I t : I I : I I M I I t : I i I I I I I t I t I I I I I I I I t It I : I I I I I I I i i M I I I I I i t I I I I 
orfl32-l EFKADVYVI GNVAKRGMDWEAI LNLGLP YI SGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132ng-l . pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 
I t I i I i I I I I t t I I i I I I I I I I I I [ I I I I i It I I t t I I I I I : i I I I I I I i I M I I I I t I t 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 



15 



orf 132ng-l . pep RSKFVHYRPRTAVLNNLEFDHMIFADLGAIQTQFHHLVRTVPSEGLIVCNGOQQSLQDT 
I t I 11 II t II II t I I I M I I t I t I I I I I I t t I I t I t : t I M I I I I I I t I I I I: I t I I I I I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 



20 



orf 132ng-l . pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA 
I I I I I II I 1 I II M I I t I I I : t I I M I I I I I I I t 11:1 t M It I I I I I II I I t I I 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 



25 



30 



orf 132ng-l .pep ARHAGVDVQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
I II: II t : it I I I II t t I I I I I I I I M I t I I I I 11 t I I I I I I I I I t t II 11 t It t I I I I I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf 132ng-l .pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
lllllllllllllllllltlllll:llltlllllllili:llllllllllll It Mil 
or f 1 3 2 - 1 ARI lAVLE PRSNTMKLGTMKS ALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLN VGKD 

0rfl32ng-l.pep FDTFVAEIVKNARTGDHILVMSNGGFGGIHTKLLDALRX 
ll:llllll]M::llllllllllllllll 111:1111 
orfl32-l FDAFVAE IVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 



35 



pir||S56459 hypothetical protein o457 - Escherichia coli >gi 1 537075 (U14003) 
ORF_o457 [Escherichia coli] >gi 11790680 (AE000494) hypothetical 48-5 kD protein 
in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score = 474 bits (1207), Expect = e-133 

Identities - 249/439 (56%), Positives = 294/439 (66%), Gaps - 13/439 (2%) 

KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADI YVIGNVARRGMDWE 8 1 
++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q IH -HGN RG VE 
RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 7 9 

AILNRGLPYISGPQWIAENVMHHWVLGVAGTHGKTTTASMIAWVLEYAGLAPGFLIGGV 141 
A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 
AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 



40 


Query: 


22 




Sbjct: 


21 


45 


Query: 


82 




Sbjct: 


80 




Query: 


142 


50 


Sbjct: 


140 




Query: 


202 


55 


Sbjct: 


191 




Query: 


262 




Sbjct: 


251 


60 


Query: 


321 




Sbjct: 


311 


65 


Query: 


380 




Sbjct: 


371 



P NF VSA L 



+S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 
-GESDFFVIEADEYDCAFFDKRSKFVHYCPRTLILNNLEFDH 190 



ADIF DL AIQ QFHHLVR VP +G 1+ 



+L+ T+ GCW+ E G 



WQ 



++ D S ++VLLDG+K G V W L+G HN N L lAAARH GV 



A ALG+F N 



•fRR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI+AVLEPRSNTMK+G 



K L SL AD+VF 



W VAE 



D DT 



+VK A+ GDHI 



Query: 



439 LVMSNGGFGGIHTKLLDAL 457 
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LVMSNGGFGGIH KLLD L 
Sbjct: 431 LVMSNGGFGGIHQKLLDGL 449 

Based on this analysis, it was predicted that these proteins from Kmeningitidis and Kgonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in ExolU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fiision protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 
experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful unmunogen. 

Example 103 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 875> 

1 ..CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

401 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

4 51 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn rmnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC.GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 ..PGrrGSDDEF KRAFGEKSPT XKKHCNRSCG lYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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10 



15 



20 



25 



30 



35 



40 



45 



301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



TCATCTCAAT 
TGTCGTCAAA 
GTTCGGCGAA 
AATACCTACG 
AGGTAATGCG 
CATCTGTCGG 
TACCGCGTGG 
TTTGGAACGG 
TCAATTCCGA 
AAATACAAGC 
CGAAGAGCAT 
TTACCCCCAT 
TTTAAATTGG 
CGATTTAAAC 
AGTTCAATTA 
GCAGCCTACA 
AGGCTGGGGG 
TCGACCTCAA 
CAAACCACTT 
CTTTCCTGAA 
GGCTTTATTC 
CAAAAATCAA 
CTACTTCGAT 
CCAATACCGT 
TCGGATGACG 
GAAACATTGC 
ACGGCAAAAA 
GGCGATTATT 
CAACATCCAA 
CCGCCTTAAA 
TATAAAAAAG 
CGGCTACCGC 
GGTGGGATTT 
GCCTACACCA 
TTTTGAGTTG 
CTTACGCCTA 
GAATCGCCCA 
GTTGAGCAGG 
GTACGCGCTG 
TTCGGCAAGA 
CAACGGGGGA 
AACAAACCGA 
GCTTACGAGC 
GTTCGACAGG 
CGCAGCGTTA 
ACGTGTAATG 
CAAAAGCGTA 
TGAGCTACAA 



TCGGTGCATC 
GGCAGCTTCA 
TCTGCGGACT 
GCCTGCTGCT 
ATGGCGGCGA 
TGTGCTTTAC 
GCGGCGGCGG 
CGCAAGCAGC 
CAGCGGAAAA 
CGTATAAAAA 
GACAAAAGCT 
CGATCCGTCC 
AATACGACGG 
ACCAAAATCG 
CGGTTTGTCT 
ATTCGGGCAG 
CTTTTAAAGG 
CAACACCGCC 
TGGGCTTCAA 
GAATTGGGGC 
CTATTTGGGG 
CCATTGTCCA 
GCCGCGCTCA 
CGGCTACCGT 
AATTTAAGCG 
AACCGGAGCT 
GCGCGCCAAC 
TCATGCCGTT 
GAAATGTATT 
ACCAGAGCGC 
GATTGTTAAA 
AGCCGCATCG 
GAACGGGGAT 
TCCAACATCG 
GAGCTGAATT 
TCAAAAAAGC 
ACAATGCGTC 
GTTTCCGCCC 
GTTGGGCAAC 
GCATCCGCGC 
T^TACCAGCA 
AACTCTTGCC 
CGAAGAAAAA 
CGTTATATCG 
TTACAGCTCG 
CTGATAAAAC 
TTGACCAATT 
GTTTTAA 



TGTCGACAGC 
GCGGCTCGGC 
TTAGGCGTGG 
AAAAGGTCTG 
TAGGTGCGCG 
GGGCACAGCA 
GCAGCACATC 
GATATTTTGT 
TGGGAGCGGG 
TTACAACAAC 
GGCGGGAAAA 
AGCCTGAAGC 
CGTATTCAAT 
GCAGCCGCAA 
TTGAACCCGT 
GCAGAAATAT 
ATTTTGAAAC 
ACCTTCCGGC 
TTATTTCCAC 
TGTTTTTCGA 
CGGTTTAAGG 
ACCGGCCGGC 
AAAAAGACAT 
TTCGGCGGCG 
GGCATTCGGA 
GCGGGATTTA 
AACCATTCGG 
CGCCAGCTAT 
TTTCCCAAAT 
GCAAACACTT 
ACAAGATGAT 
ACAACTACAT 
ATTCCGAGCT 
CAATTTCAAA 
ACGATTATGG 
ACGCAACCGA 
CAAAGAAGAC 
TGCCGCGAGA 
AAACTGACTT 
GACGGCTGAA 
ATTTCCGGCA 
CGCCAGCCTT 
CCTTATTTTC 
ATCCGCTCGA 
TTCGACCCGA 
GTTGTGCAAC 
TTGCACGCGG 



AATTTTATTG 
AGGCATCAAC 
ATGACGTCGT 
ACCGGCACCA 
CAAATGGCTG 
GGCGCAGCGT 
GGAAATTTTG 
ACAAGAGGGT 
ATTTACAAAG 
CAAGAACTAC 
CCTg.CaCCG 
AGCAGTCGGC 
AAATACACGG 
AATCATCAAC 
ATACCAACCT 
CCGAAAGGGT 
CTACAACAAC 
TGCCCCGCGA 
AACGAATACG 
CGGTCCTGAT 
GCGATAAAGG 
AGCCAATATT 
TTACCGCTTA 
AATATACGGG 
GAAAACTCGC 
TGAACCCGTA 
TCAGCATTAG 
TCGCGCACAC 
CGGCGACTCC 
GGCAATTTGG 
ACATTAGGAT 
CCACAACGTT 
GGGTCAGCAG 
GACAAAGTGC 
GCGTTTTTTC 
CCAACTTCAG 
CAACTCAAAC 
TTACGGACGT 
TGGGCGGCGC 
GAACGCTATA 
ACTGGGCAAG 
TGATTTTTGA 
CGCGCCGAAG 
TGCGGGCAAT 
AAGACAAGGA 
GGCAAATACG 
ACGCACCTTT 



CCGGACTGGA 
AGCCTTGCCG 
TCAGGGCAAT 
ATTCAACCAA 
GAAAGCGGAG 
GGCGCAAAAT 
GCGCGGAATA 
GCTTTGAAAT 
GCAACAGTGG 
AaAAATACAT 
CAATACGACA 
AGGCAATCTG 
CGCAATTTCG 
CGCAATTATC 
CAATCTGACC 
CGAAGTTTAC 
GCGAAMTCC 
AACCGAGTTG 
GCAATVAACCG 
CAGGACAACG 
GCTGCTGCCC 
TCAACACGTT 
AACTACAGCA 
CTATTACGGC 
CGACATACAA 
TTGAAA/y^T 
TGCGGACTTC 
ACCGTATGCC 
GGCGTTCACA 
CTTCAATACC 
T7WVACTGGT 
TACGGGAAAT 
CACCGGGCTT 
ACAAACACGG 
ACCAACCTTT 
CGATGCGAGC 
MGGTTATGG 
TTGGAAGTCG 
GATGCGCTAT 
TCGACGGCAC 
CGTTCCATCA 
TTTTTACGCC 
TCAAAAATCT 
GATGCGGCAA 
CGAAGACGTA 
GCGGCACAAG 
TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 



50 



55 



60 



65 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
761 
801 
851 



EAQIQVLEDV 
PGAfTQQDKS 

ssqex;asvds 
ntyglllkgl 
yrvggggqhi 
kykpyknynn 
fkleydgvfn 
aaynsgrqky 
qttlgfnyfh 
qkstivqpag 
sddefkrafg 
gdyfmpfasy 
ykkgllkqdd 
aytiqhrnfk 
espnnasked 
fgksiratae 
ayepkknlif 
tcnadktlcn 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
VSALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 
VDGITQTFYS 
SIAGSANLRT 
ESGASVGVLY 
ALKFNSDSGK 
QYDITPIDPS 
RNYQFNYGLS 
AKILDLNNTA 
QDNGLYSYLG 
NYSTNTVGYR 
LKKYGKKRAN 
GVHTALKPER 
YGBCWWDLNGD 
TNLSYAYQKS 
LEVGTRWLGN 
RSIKQTETLA 
DAATQRYYSS 
LMTMSYKF* 



ENLDNIVRSI 
TSTDAGRAGG 
LGVDDWQGN 
GHSRRSVAQN 
WERDLQRQQW 
SLKQQSAGNL 
LNPYTNLNLT 
TFRLPRETEL 
RFKGDKGLLP 
FGGEYTGyyG 
NHSVSISADF 
ANTWQFGFNT 
IPSWVSSTGL 
TQPTNFSDAS 
KLTLGGAMRY 
RQPLIFDFYA 
FDPKDKDEDV 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with with the probable TonB-dcpendent receptor HI121 of HAnfluemae (accession number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

lYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 
I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNIQEM+FSQ+ ++GV+TA 
INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

LKPERANTWQFG FXT YKKGLLKQDDTLGLKLVGYRSRI DNYIHNVYGKWWDLNGDI PSWV 150 
LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 
LKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYIHNVYGVWW — RDGMPTWA 680 



Orf 133 : 


31 


HI121 : 


563 


Orf 133 : 


91 


HI121 ; 


623 


Orf 133 : 


151 


HI121; 


681 


Orf 133 : 


211 


HI121: 


741 


Orf 133: 


271 


HI121: 


801 


Orfl33: 


331 


HI121: 


860 


Orf 133: 


391 


HI121: 


911 



S G YTI H+ + 



YD GRFF N+SYAYQ++ QPTN++DAS PNN 



AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 



G+ 



R+ 



++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 



LDAGNDAA +RYYSS 



YKF 



+ + C D + C GG+ K+VL NFARGRT++++++ 
-NNSIECAQDSSAC GGSDKTVLYNFARGRTYILSLN 910 



Homology with a predicted ORF from Kmenin^tidis (strain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of M 
meningitidis: 



10 20 30 

35 orf 133 .pep PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 

Ml I I I I I I i I I I I I I I I I t I I : t I I I 
orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 490 500 



40 40 50 60 70 80 90 

orf 133. pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I M t I t I I i I I I I M I I I I I I [ I I I I i I I I I t M I I I I I I I I I I M I I t I I I I I I I I I I I 
orf 133a YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
510 520 530 540 550 560 

45 

100 110 120 130 140 150 

orf 133. pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
I t I I I M I t I I I i I I I I t I i I I t I I I I I I I I I I t i I i t I I M i I I t I I I I : I I t i t I 
orf 133a KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 
50 570 580 590 600 610 620 



160 170 180 190 200 210 

orf 133 . pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 
I t I t I M I I t I I I M I : III I t I I I II I I I I i II I I I I I i I I II II It I 

55 orf 133a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 

630 640 650 660 670 680 



220 230 240 250 260 270 

orf 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIEyVTAEERYIDG 

60 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 II I M 1 1 1 1 II M II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II I n I I i 

orf 133a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 



65 



orf 133 .pep 



280 290 300 310 320 330 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
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IN I I I I I I i M M I I I I I I I 1 1 i It I I I I I I I I I I I I I It I I I i I I 1 1 1 1 i I 

orfl33a TNGXXTSNFRQLGKRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 
750 760 770 780 790 800 

340 350 360 370 380 390 

or f 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 
1 I I i I I i : : I I i I I I I i I I t I : I I I I I : I I I I I 1 I t I I I i I I t I I I I I t i 11 : I I t i 
orfl33a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGBCYGGTSKSVLTNFARGXTFLITMSY 
810 820 830 840 850 860 



orfl33.pep KFX 
i I I 

orfl33a KFX 
870 

A partial ORF133a nucleotide sequence <SEQ ID 879> is: 

1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 

51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC ACANCAANAT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGGCATCAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

251 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 

301 GTCAAAGGCA GCTTCAGCGG CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 

351 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

401 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 

451 AATGCGATGG CGGCGATAGG TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 

501 TGTCGGTGTG CTTTACGGGC ACAGCAGGCG CAGCGTGGCG CAAAATTACC 

551 GCGTGGGCGG CGGCGGGCAG CACATCGGAA ATTTTGGCGC GGAATATCTG 

601 GAACGACGCA AGCAACGATA TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 

651 TTCCAACAGC GGAAAATGGG AGCGGGATTT CCAAAAGTCG TACTGGAAAA 

701 CCAAGTGGTA TCAAAAATAC GATGCCCCCC AAGAACTGCA AAAATACATC 

751 GAAGGTCATG ATAAAAGCTG GCGGGAAAAC CTGGCGCCGC AATACGACAT 

801 CACCCCCATC GATCCGTCCA GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 

851 TTAAATTGGA ATACGACGGC GTATTCAATA AATACACGGC GCAATTTCGC 

901 GATTTAAACA CCAAAATCGG CAGCCGCAAA ATCATCAACC GCAATTATCA 

951 ATTCAATTAC GGTTTGTCTT TGAACCCGTA TACCAACCTC AATCTGACCG 

1001 CAGCCTACAA TTCGGGCAGG CAGAAATATC CGAAAGGGTC GAAGTTTACA 

1051 GGCTGGGGGC TTTTNAAAGA TTTTGAAACC TACAACAACG CAAAAATCCT 

1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTTCCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC ANGACAACGG 

1251 GCTTTATTCC TATTTGGGGC GGTTTAAGGG CGATAAAGGG CTGCTGCCCC 

1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGACATT TACCGCTTAA ACTACAGCAC 

1401 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

1451 CGGATGACGA ATTTAAGCGG GCATTCGGAG AAAACTCGCC GACATACANG 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAACCCGTAT TGAAAAAATA 

1551 CGGCAAAAAG CGCGCCAACA ACCATTCGGT CAGCATTAGT GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

1701 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

1751 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTTCAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTTCAGC GATGCGAGCG 

2051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACTCAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

2201 TCGGCAAGAG CATCCGCGCG ACGGCTGAAG AACGCTATAT CGACGNCACC 

2251 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

2301 ACAAACCGAA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAA/^TCTG 

2401 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

2451 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2601 GAGCTACAAG TTTTAA 
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This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



KDKKVFTDAR 
RXDSGFGRVN 
VKGSFSGSAG 
NAMAAIGARK 
ERRKQRYFEQ 
EGHDKSWREN 
DLNTKIGSRK 
GWGLXKDFET 
FPEELGLFFD 
YFDAALKKDI 
KHCNQSCGIY 
NIQEMYFSQI 
GYRSRIDXYI 
FELELNYDYX 
LSRVSALPRD 
NGXXTSNFRQ 
FDRRYIDPLD 
KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
IINRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 
HNVYGKWWDL 
RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDWQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 
F* 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDEFPCR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSNFXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



10 20 30 40 

orf 133a . pep KDKKVFTDARAVSTRQDIFKSXENLDNIVRXIPGAFTXQXKS 

i i I I I I I t I I I I I I I I I I i I I I I I [ I I I I i I I [ I I [ II 
orf 133-1 EAQIQVLEDVHVKAKRVPKDKKVFTDARAVSTRQDIFKSSENLDNIVRSIPGAFTQQDKS 

10 20 30 40 50 60 



50 60 70 80 90 100 

orf 133a . pep SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 
II Mini i I I t I I I I I I I II II I M I I I II I I I I I I I I I I M I I I I II i t t I II I 
orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 

70 80 90 100 110 120 



110 120 130 140 150 160 

orf 133a . pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
II I t I I I I I I I I I t i t I I I I I I I I I I I II II I I I I I I I I I I I I I II I I I II I I I I I II 
orf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

130 140 150 160 170 180 



170 180 190 200 210 220 

orf 133a . pep ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 
I I I I II II I I I i I I I M I I II I I I I t I I t I I I I I II II I I I I I I I I II I : I I I I I : I I I 
orf 133-1 ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 

190 200 210 220 230 240 



230 240 250 260 270 280 

orf 133a . pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 
M I I : I : : II I I : : t : I I I I I I I I I I I t t I I 11 I I I II I I I I t I I I I I I I II 
orf 133-1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 

250 260 270 280 290 



290 300 310 320 330 340 

orf 133a . pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
I I I I I I I I I I II I I I I I I I I I I I I I II I II I I II I I I I I I I I I t I I I I I I I I II I I I I I I 
orfl33-l LFKLE YDGV FNKYTAQFRDLNTKI GSRKI INRN YQFN YGLS LN PYTNLNLT AAYNSGRQK 

300 310 320 330 340 350 



350 360 370 380 390 400 

orf 133a . pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 
t I I I It I I I I I I I I I I I I I I I I I I 1 I i I : I I I I I M I I t I I i I i M I M I I II I II I I 
orf 133-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 
360 370 380 390 400 410 



410 420 430 440 450 460 

orf 133a . pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
I II I I t I M I I I I I I I t I I II I I I I I I I i I I I I I I I I I I I I I I I I 1 I I t t I I t I I I I I I 
orf 133-1 EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
420 430 440 450 460 470 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



470 . . 480 490 500 510 520 

orf 133a . pep LNYSTNTVGYRFGGXYTGYYXSDDEFJCRAFGENSPTYXKHCNQSCGIYEPVLKKYGKKRA 
I I I i I t I I I I I t I I Mill Itlllllllltillll ItlhllllllMIIIIIIIII 
orf 133-1 LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGIYEPVLKKYGKKRA 
480 490 500 510 520 530 

530 540 550 560 570 580 

orf 133a . pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
I I I I I I I II I I I I I I I II I t I i I I i 1 II I I II I I I II I I t I I II II I I I I t M t I M t 11 
orf 133-1 NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

orf 133a . pep TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 
lllllllllll lllllllllllll MIIIIIIIIIIII:|lllliillllllll[ilt 
orf 133-1 TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 

650 660 670 680 690 700 

orf 133a . pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 

I t II i I I I I I I I t I I i [ II II I I I I i II I I I I I I II I I t I I I I I [ t I I I II II I I II II 
orf 133-1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 

660 670 680 690 700 710 

710 720 730 740 750 760 

orf 133a . pep RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 

II t t I I II II I I I t II II I I II I I I I I M I I I I II I I I I I Ml IIIIIMI 

O r f 1 3 3 - 1 RVSALPRDYGRLE VGTRWLGNKLT LGGAMRYFGKS IRATAEERYI DGTNGGNTSNFRQLG 

720 730 740 750 760 770 

770 780 790 800 810 820 

orf 133a . pep KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
MM I I I i II t I i II II I I I II II I M M I II II II I II I I 1 M M M II II II M I 
orf 133-1 KRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
780 7 90 800 810 820 830 

830 840 850 860 870 

orf 133a . pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
I I II I I I I I : I I I I I M I II M M II I I I I II I I M I li I M I II I I I 
orf 133-1 SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
840 850 860 870 880 

Homology with a predicted ORF jfrom N, gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from A^. 
gonorrhoeae: 

orf 133. pep 
orf 133ng 
orf 133 .pep 
orf 133ng 
orf 133 -pep 
orf 133ng 
orf 133. pep 
orf 133ng 
orf 133 .pep 
orf 133ng 
orf 133. pep 
orf 133ng 



PG YYGS DDEFKRAFGENS PTXKKHCNRSCG I 
Mlll::illll]lllll: Mil: MM 
FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFECRAFGENSPAYKEHCDPSCGL 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
II I I II M M I M I I I II M I M M I I II I I M II I I I I M I I I I I I i I I M I M I I I M 
YE PVLKKYGKKRANNHSVS I SADFGDY FMPFAG Y SRTHRMPN IQEMYFSQIGDSGVHTAL 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 

ItllllllliM IIIIIMIIII IIIIIMIMIIIIII IMIM : 

KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 



31 



560 



91 



620 



151 



680 



211 



STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 
MIMMMII I MM: I I I II I I I II M I I M I M M I II II I II I I I I 

STGLAYT IRHRN FKDKVHKHGFELELN YDYGRFFTNLS YAYQKSTQPTN FS DASES PNN A 740 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSXRATAEERYIDG 271 
II I M I I M I I II I II II M li I M II M II I I I I II M I I I I I II II II I M II I I M I 
SBCEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 800 



TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
M 1 I I II I I II ) M I II I I I I I M i I II M I I M I I I I I I I II II i M I M I I II M 
TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPL 



331 



860 
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orf 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 391 

I t I I I I I : : M i I I t I I I i I I 1 It I I I I t I I I I i I I I I I I I I I I t M t I I I I I I I I I I I 
orf 1 33ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 



orfl33,pep KF 393 
I I 

orfl33ng KF 922 

The complete length ORF133ng nucleotide sequence <SEQ K) 881> is predicted to encode a 
protein having amino acid sequence <SEQ ID 882>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 lEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLLWLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD lYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNK LTLGG AMRYFGKS IR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 



1 ATGAGATCTT CTTTCCGGTT 

51 TATGCTATAT CATCATAGTT 

101 AGGCGCAGAT ACAGGTTTTG 

151 CCGAAAGACA AAAAAGTGTT 

201 gGATGTGTTC AAATCCGGCG 

251 CCGGTGCGTT TACACAGCAA 

301 ATTCGCGGCG ACAGCGGGTT 

351 CACGCAGACC TTTTATTCGA 

401 CATCTCAATT CGGTGCATCT 

451 GTCGTCAAAG GCAGCTTCAG 

501 TTCGGCGAAT CTGCGGACTT 

551 ATACCTACGG CCTGCTGCTA 

601 GGTAATGCGA TGGCGGCGAT 

651 GTCTGTCGGT GTGCTTTACG 

701 ACCGCGTGGG CGGCGGCGGG 

751 CTGGAACGGC GCAAACAGCA 

801 CAATGCCGGC AGCGGAAAAT 

851 AAACAAAGTG GTATAAAAAA 

901 ATCGAAGAGC ATGATAAAAG 

951 CATCACCCCC ATCGATCCGT 

1001 TGTTTAAATT GGAATACGAC 

1051 CGCGATTTAA ACACCAGAAT 

1101 TCAATTCAAT TACGGTTTGT 

1151 CCGCAGCCTA CAATTCGGGC 

1201 ACAGGCTGGG GGCTTTTAAA 

1251 CCTCGACCTC AACAACACCG 

1301 TGCAAACCAC TTTGGGCTTC 

1351 CGCTTTCCTG AAGAATTGGG 

1401 CGGGCTTTAT TCCTATTTGG 

1451 CTCAAAAATC AACCATTGTC 

1501 TTCTACTTCG ATGCCGCGCT 

1551 CACCAATGCA ATCAACTACC 

1601 GCTCGGAAAA CGAATTTAAG 

1651 AAGGAACATT GCGACCCGAG 

1701 ATACGGCAAA AAGCGCGCCA 

1751 TCGGCGATTA TTTCATGCCG 



GAAGCCGATT TGTTTTTATC TTATGGGTGT 
ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 
GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 
TACCGATGCG CGTGCCGTAT CGACCCGTca 
AAAACCTCGA CAACATCGTA CGCAGCATAC 
GATAAAAGCT CGGGCATTGT GTCTTTGAAT 
CGGGCGGGTC AATACGATGG TGGACGGCAT 
CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 
GTCGACAGCA ATTTTATTGC CGGACTGGAT 
CGGCTCGGCA GGCATCAACA GCCTTGCCGG 
TAGGCGTGGA TGACGTCGTT CAGGGCAATA 
AAAGGTCTGA CCGGCACCAA TTCAACCAAA 
AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 
GGCACAGCAG GCGCGGCGTG GCGCAAAATT 
CAGCACATCG GAAATTTTGG TGAAGAATAT 
ATATTTTGTA CAAGAGGGTG GTTTGAAATT 
GGGAACGGGA TTTGCAAAGG CAATACTGGA 
TACGAAGACC CCCAAGAACT GCAAAAATAC 
CTGGCGGGAA AACCTGGCGC CGCMTACGA 
CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 
GGCGTATTCA ATAAATACAC GGCGCAATTT 
CGGCAGCCGC AAAATCATCA ACCGCAATTA 
CTTTGAACCC GTATACCAAC CTCAATCTGA 
AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 
AGATTTTGAA ACCTACAACA ACGCGAAAAT 
CCACCTTCCG GCTGCCCCGC GAAACCGAGT 
AATTATTTCC ACAACGAATA CGGCAAAAAC 
GCTGTTTTTC GACGGTCCTG ATCAGGACAA 
GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 
CAACCGGCCG GCAGCCAATA TTTCAACACG 
CAAAAAAGAC ATTTACCGCT TAAACTACAG 
GTTTCGGCGG CGAATATACG GGCTATTACG 
CGGGCATTCG GAGAAAACTC GCCGGCATAC 
CTGCGGGCTT TATGAACCCG TATTGAAAAA 
ACAACCATTC GGTCAGCATT AGTGCGGACT 
TTCGCCGGCT ATTCGCGCAC ACACCGTATG 
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1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 



CCCAACATCC 
CACCGCCTTA 
CCTATAAAAA 
GTCGGCTACC 
ATGGTGGGAT 
TTGCCTACAC 
GGTTTTGAGC 
TTCTTACGCC 
GCGAATCGCC 
GGGCTGAGCA 
CGGTACGCGC 
ATTTCGGCAA 
ACCAACGGGG 
CAAACAAACC 
CCGCTTACGA 
CTGTTCGACA 
AACGCAGCGT 
TAACGTGTAA 
AGCAAAAGCG 
GATGAGCTAC 



AAGAAATGTA 
AAACCAGAGC 
AGGATTGTTA 
GCAGCCGCAT 
TTGAACGGGG 
CATCCGACAC 
TGGAGCTGAA 
TATCAAAAAA 
CAACAATGCC 
GGGTTTCCGC 
TGGTTGGGCA 
GAGCATCCGC 
GAAATACCAG 
GAAACCCTTG 
GCCGAAGAAA 
GGCGTTATAT 
TATTACAGCT 
TGCTGATAAA 
TATTGACCAA 
AAGTTTTAA 



TTTT.TCCCAA 
GCGCAAACAC 
AAACAAGATG 
TGACAACTAC 
ATATTCCGAG 
CGCAATTTCA 
TTACGATTAT 
GCACGCAACC 
tccaaAGAAG 
CCTGCCGCGA 
ACAAACTGAC 
GCGACGGCTG 
CAATGTCCGG 
CCCGACAGCC 
AACCTTATTT 
CGATCCGCTC 
CGTTCGACCC 
ACGTTGTGCA 
TTTCGCACGC 



ATCGGCGACT 
TTGGCAATTT 
ATATATTAGG 
ATCCACAACG 
CTGGGTCGGC 
AAGACAAAGT 
GGGCGTTTTT 
GACCAATTTC 
ACCAACTCAA 
GATTACGGAC 
TTTGGGCGGC 
AAGAACGCTA 
CAACTGGGCA 
TTTGATTTTT 
TCCGCGCCGA 
GATGCGGGCA 
GAAAGAC7VAG 
ACGGCAAATA 
GGACGCACCT 



CCGGCGTTCA 
GGCTTCAATA 
ATTGAAACTG 
TTTACGGGAA 
AGCACCGGGC 
GCACAAACAC 
TCACCAACCT 
AGCGATGCGA 
ACAAGGTTAT 
GTTTGGAAGT 
GCGAtgcGCT 
TATCGACGGC 
AGCGTTCCAT 
GATTTTTACG 
AGTCAAAAAC 
ATGATGCGGC 
GACGAAGACG 
CGGCGGCACA 
TCTTGATGAC 



This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 



1 MRSSFRLKPI CFYLMGVMLY HHSYAE DAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 lEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLFKLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD lYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNKLTLGG AMRYFGKSIR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETIARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDJ^TQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 



10 20 30 40 50 60 

orf 133ng-l .pep SFRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

I I I I M I I I I I I I I I I I I I I I I i t I I I t I i 
orf 133-1 EAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

10 20 30 



70 80 90 100 110 120 

orf 133ng-l . pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
lllll:ltl:lllliilllillllllllllltilllllllllllltllllltltllllll 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 133ng-l .pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

illilllMlltlllitlltllllMilllllllllltlMllillM Itllll 

orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf 133ng-l . pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 

III IIIMIllllllllMllllilinillllllltlllllltlllllll 

orf 133-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 

160 170 180 190 200 210 

250 260 270 280 290 300 

or f 1 33ng- 1 . pep GNFGEE YLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 
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MM I I I I I t I i : I I i I t I : I I I I : Itlllllltll M t ItllMIII 
orf 133-1 GNFGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 

220 230 240 250 260 

5 310 320 330 340 350 360 

orf 133ng-l . pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 
I I It I I M I I I I I I 1 I I t I t : I I I I I I I t I I I I I I t t I I I I I I I I I I I I I I : I I I i i I 1 
orfl33-l HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 

10 

370 380 390 400 410 420 

orf 133ng-l . pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 

I I I t I I I i I It I I I I I 1 I I I I I t I I I I i I I I I I I : I I t I I [ I I I I I [ I I i I t I t I I I I I I 
orf 133-1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 

15 330 340 350 360 370 380 

430 440 450 460 470 480 

orf 133ng-l . pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

II I I I I M I II I I I I M I I II I t I M t II I M i I I I I M II I II t I) I II I I I I I I I t I I 
20 orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

390 400 410 420 430 440 

490 500 510 520 530 540 

orf 133ng-l . pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
25 I I I I II I I I I I I I I i I II I I I II I I I II II I i t I I t ::: II I II I I M I II I :: II II II 

orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

550 560 570 580 590 600 

30 orf 133ng-l . pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 

lllll:||:lt: I I I : I I II I I I II II II I I I I I I I I II I I I I I I I I t : I I I M I I t I I 
orf 133-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 
510 520 530 540 550 560 

35 610 620 630 640 650 660 

orf 133ng-l . pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
I I I II I II II I I I I I I ! I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
570 580 590 600 610 620 

40 

670 680 690 700 710 720 

orf 133ng-l . pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 

I I I I I I I I I M II t I I : I I I I I I I I : I t 1 I I I I I I I I I I I I I I I 1 I t I I I I I I I I I I M t 
orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 

45 630 640 650 660 670 680 

730 740 750 760 770 780 

orf 133ng-l . pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
t II I I I I M I I I I M 1 I I I i I I I II I I I I I I I I M I t I I I t I I t I I t I I II I I 1 I I II I I 
50 orf 133-1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 

690 700 710 720 730 740 

790 800 810 820 830 840 

orf 133ng-l . pep YFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
55 I II I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I M I I 1 II I I I I I II M I I II I M 

orf 133-1 YFGKSIRATAEERYIDGTNGGNTSNFRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
750 760 770 780 790 800 

850 860 870 880 890 900 

60 orf 133ng-l .pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

II I I t I I I I I t I t I I I I I I I I I I I t I I I I I II 1 I II 11 I I M I I I I II t II II I I I I I I I 
orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

810 820 830 840 850 860 

65 910 920 

orfl33ng-l.pep VLTNFARGRTFLMTMSYKFX 
I I I I I I I I I I I I I I I I I I I I 
or f 1 3 3 - 1 VLTN FARGRT FLMTMS YKFX 

870 880 



70 In addition, ORF133ng-l is homologous to a TonB-dependent recqjtor in Kinfluenzae: 
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sp|P45114 iyC17_HAEIN PROBABLE TONE-DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi| 1075372 Ipirl IG64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae (strain Rd KW20) >gi 1 1574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) [Haemophilus influenzae] Length = 913 
Score = 930 bits (2377), Expect = 0.0 

Identities = 476/921 (51%), Positives = 619/921 (66%), Gaps = 72/921 (7%) 





Query: 


38 


10 


Sbjct ; 


29 




Query: 


98 


15 


Sbjct: 


89 




Query: 


158 




Sbjct: 


149 


20 


Query : 


218 




Sbjct: 


209 


25 


Query: 


278 




Sbjct: 


266 




Que ry : 


304 






326 




Query: 


364 


35 


Sbjct: 


385 




Query: 


424 




Sbjct: 


445 


40 


Query. 


4 82 




Sbjct: 


505 


45 


Query: 


542 




Sbjct: 


556 




Query: 


602 


50 


Sbj ct I 


605 




Query: 


662 


55 


Sbjct: 


665 




Query: 


722 




Sbjct: 


723 


60 


Query : 


782 




Sbjct: 


783 


65 


Query: 


842 




Sbjct: 


842 




Query: 


902 


70 


Sbjct: 


893 



QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 
+ L + V K + DKK FT+A+A STR++VFK + +D ++RSIPGAFTQQDK SG+V 
ETLGQIDWEKVISNDKKPFTEAKAKSTRENVFKETQTIDQVIRSIPGAFTQQDKGSGW 88 

SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFS 157 
S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 148 



G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 



VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 



LQRQYWK- 
L +++W 



-TKWY- 
+Y 



-BCKYEDPQELQK YIEE 

KK +D ++LQK lEE 



303 



HDKSWRENLAPQYDIT PI DPSGLKQQS AGNLFKLE YDGVFNKYTAQFRDLNTRIGSRKI I 363 
DKS+ N QY + PI+P L+ +S +L K EY AQ R L+ +IGSRKI 



NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 
NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

NRNYQVNYNFNNNSYLDLNLMAAHNIGKTI YPKGGFFAGWQVADKLITKNVANIVDINNS 444 



TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ 



D GLYS+ 



-LGRFKGDKG 481 
GR+ G K 



LLPQ+S I+QP+G Q F T YFD AL K lY LNYS N +Y F GEY GY 



555 



EN+ 



+ EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 
-INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 



NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 



HNVYG WW 



+P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 



Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 



RY+GKS RAT EE YI+G+ 



+ +R+ 



++K+TE + +QP+I D + +YEP K+ 



LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS 



K+VL NFARGRT++++++YKF 



+ + C D + C 
-NNSIECAQDSSAC— 



GG+ 

-GGSD 892 
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The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from Kmeningitidis and ^gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT. . 

This conresponds to the amino acid sequence <SEQ ID 886; ORFl 12>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH. . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CA7VATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACGC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 

951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG. . . 

This corresponds to the amino acid sequence <SEQ ID 888; ORFl 12-1>: 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 BCEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAHT PQTTRHGNMG 

301 LKLFGGICXG LLFHLA GRLF GFTSQL. . . 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 
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Homology with a predicted ORF from Kmenin^tidis (strain A) 

ORFl 12 shows 96.4% identity over a 166aa overlap with an ORF (ORFl 12a) from strain A olN. 
meningitidis: 

10 20 30 40 50 60 

or f 112 . pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

I I t 1 1 I I I M M I I I M i I I I I i I t I I I I I I i I I I I I I I I M I M I I It I I I I I I I II 
orfll2a MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 112 . pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

II II: 1 I t I I I I I I I t lllllllll:llilllMMIIIIIIIIIilllllllllllll 
orf 112a AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 



130 140 150 160 

orf 112 . pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
I t I M II I I II I II I II II I I I II I I I I I I I I I I I : II II II II M 
orf 112a VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 



orf 112a ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 

The ORFl 12a nucleotide sequence <SEQ ID 889> is: 



1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGNTG 

151 GGNTACACCG CCCTCAAAAT GNCCGCCCGC GCCTACGAAC TGATGCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCTNT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAN CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCGGCCAT CAACGGCAAA ATCAGTACCG GC7VATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCATTAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 CCTGCTGGGC ATTAAAATCT GGGCCCGCAA CGATAAAAAC GAACTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAANT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACGC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAN NNACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCT^AACCA CCCGCCACGG CAATATGGGC 

901 TTAAAANTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 NCGGCTCTTC NGGTTTACCA GCCAACTCTA CGGCATCCCG CCCTTCCTCG 

1001 NCGGCGCACT ACCTACCATA GCCTTCGCCT TGCTCGCCGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGCTA A 

This encodes a protein having the amino acid sequence <SEQ ID 890>: 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK ASGMSTKiaL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSIINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAFT PQTTEmGNMG 

301 LKXFGGICLG LLFHL AGRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

351 RKQEKR* 

ORFl 12a and ORFl 12-1 show 96.3% identity in 326 aa overlap: 



orf 112a . pep MNLXSRYIIRQMAVMAVYALLAFIALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 

1 1 1 II 1 1 II II M 1 1 1 1 1 1 1 1 1 1 1 II II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II n 

orf 112-1 MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 



orf 112a. pep 



AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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I 111:1 Mil I III 11 nil 11111:1111 ItlllMMIillin II I Mill II 11 I 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIBCASGMSTKKLLLILSQFGFIFAIATVALGEW 

orf 112a . pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
I I I I II I M II II M I M M I I I I I I II II M I II I I I I I II I I II II II II I M t 1! I 
orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDBCN 

orf 112a . pep ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 

I II I II I I I I II i M If M I I II II i II I II I M I I I II II I II II I t 11 I II I I M I I 
orfll2-l ELAEAVEADSAVLNS DGSWQLKN IRRSTLGEDKVE VS I AAEENWPIS VKRNLMDVLLVKP 

orf 1 12a . pep DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 
I M I I M I I I I II I M I I I I I I I I II II I t I I I I I II I II II I M I II II M M I M i 
orf 112-1 DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

orf 112a , pep LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II I I M I I I I II II i I I I t I I t I 
orf 112-1 LKLFGGICXGLLFHLAGRLFGFTSQL 

Homology with a predicted ORF from N. gonorrhoeae 

ORF112 shows 95.8% identity over 166aa overlap with a predicted ORF (0RF112ng) from A^. 
gonorrhoeae: 

orf 112. pep - MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

I I t I I I i [ i I I I M I M I II I II II M I II I I II I M II I I t M I I I 11 I II I I I I M II 
orfll2ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 



orf 112, pep 
orfll2ng 
orf 112. pep 
orfll2ng 



AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAXATVALGEW 120 
I I I I: M I I M I II : I II I I I I I I It: I I I M I I I II II M I II I II I I I II I : M I M I 
AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 

VAPTLSQKAENIKAAAINGKISTGNTGLWLJCEKNSVINVREMLPDH 166 
IIIMIIIIItlllllMIIIIIIIIIMIMI:l:llll lllll 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 180 



The complete length ORFl 12ng nucleotide sequence <SEQ ID 891 > is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGCTACACCG 
CGCCGTCCTC 
GCGAACTGGC 
TTGATTCTGT 
CGGCGAATGG 
cCGCCGCCAt 
AAAGAAAAAa 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
cgCCGCCGCC 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTTGCCTT 
TTAAAACTCT 
CAGGCTCTTC 
CCGGCGCACT 
CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CTCAGTTCGG 
GTTGCGCCCA 
taacggCAAA 
ccAGCATTAT 
ATCAAAATTT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAACTT 
CGTCAAGCCC 
ACCTCCAAAA 
CGTAAACTCG 
CGCCTTTACG 
TCGGCGGCAT 
GGGTTTACCA 
GCCTACCATA 
AAAAACGTTG 



CATCATCCGC 
TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGCCTCTCT 
GCCAGCGGCA 
TTTTATTTTT 
CGCTGAGCCA 
ATCAGCAccg 
CAATGTGcGc 
GGGCGCGCT^ 
GCCGTTTTGA 
CATCATGGGT 
gGCCGATTGC 
GACCAAATGT 
CAACAGCCAA 
TTTACCCCGT 
CCGC7VAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 
A 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCG 
AAAAGCCGAA 
gcAATACCGG 
GGAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
ACAGACAAAA 
CGTCAGACGC 
CCGTCGGCGA 
AACACCCAAA 
CGCCGCATGG 
CGCGCCACGG 
TTGCTGTTCC 
CGGCACCCCA 
TGCTCGCTGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TCATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAag 
CCTTTggcTG 
CCGACCATAC 
GAATTGGCAG 
CAGCTGGCAG 
TCGAAACATC 
AACCTGATGG 
GCTGACCACC 
TCTACGCCAT 
GTCATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having amino acid sequence <SEQ ID 892>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW R KLVYPVAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICLG LLFHLAGRLF GFTSQLYGTP PFLAGALPTI AFALLAVWLI 
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351 RKQEKR* 

ORPl 12ng and ORFl 12-1 show 94.2% identity in 326 aa overlap: 



10 20 30 40 50 60 

orfll2ng MNLISRYIIRQMAVMAVYAIJJVFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
I I I I I I I I I M I I I I I I I I I I I I I I t I I I I I t I I t t I I t I I I I I I M t I I I I I I I I I I I I 
orf 112-1 MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

10 20 30 40 50 60 



10 



15 



20 



25 



70 80 90 100 110 120 

orfll2ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 
lll|:tlMllllt:tll)l[lllll:lllllilltMIIIMIIIIIIIill:ltllll 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfll2ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEPCTSIINVRGMLPDHTLLGIKIWARNDKN 
I I I I I i I t I I I I i I I t I I I i I I I t t n I I I I I I : I Mil I t II I i I t I I I I I I I I I II 
orfll2-l VAPTLSQKAEN IKAAAINGKI STGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfll2ng ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
I I I I I II t I I I I II II II M I II I I II : I I I : I : I II II : I II : I : II i t i I I II I I 
orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

190 200 210 220 230 240 



250 260 270 280 290 300 

orfll2ng DC^SVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVAAWVMALVAFAFTPQTTRHGNMG 
I I I I I I I I I I I i It I I II II II : I I I I I I I I II Ml : I I I I I I I I I II II II I t t I I I I I 
30 orf 112-1 DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

250 260 270 280 290 300 



35 



310 320 330 340 350 

orfll2ng LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 

I I I I I I II I I I II t I I II I I I I I I I 
orf 112-1 LKLFGGICXGLLFHLAGRLFGFTSQL 

310 320 



This analysis suggests that these proteins from N.meningitidis and N.gonorrhoeae, and their 
40 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I - PCR primers 





X 1 IIllCI 






ORFl 


Forward 
Reverse 


CGCGGATCCGCTAGC-GGACACACTTATTTCGG 
CCCGCTCGAG-CCAGCGGTAGCCTAATT 


BamHI-Nhel 
Xhol 


ORF2 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-GACGGCATAACGGCG 


BamHI-Ndel 
Xhol 


ORF2-1 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-TGATTTACGGACGCGCA 


Bamm-Ndel 
Xhol 


ORF4 


Forward 
Reverse 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC 
CCCGCTCGAG- TTTGGCTGCGCCTTC 


BamHI-Ndel 
Xhol 


ORF5 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC 

CGGGATCC-ATGGAAGGCGCACAAC 

CCCGCTCGAG-GACTGTGCAAAAACGG 


Ndel-Ncol 

BamHI 

Xhol 


ORF6 


Forward 
Reverse 


CGCGGATCCCATATG-ACCCGTCAATCTCTGCA 
CCCGCTCGAG-TGCGCCGAACACTTTC 


BamHI-Ndel 
Xhol 


ORF7 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC 
CCCGCTCGAG-TTTCAAAATATATTTGCGGA 


BamHI-Nhel 
Xhol 


ORF8 


Forward 
Reverse 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC 
CCCGCTCGAG-AGCAGGCTTTGGCGC 


BamHI-Ndel 
Xhol 


ORF9 


Forward 
Reverse 


CGCGGATCCCATATG-CCGAAGGAAGTCGGAAA 
CCCGCTCGAG-TTTCCGAGGTTTTCGGG 


BamHI-Ndel 
Xhol 


ORFIO 


Forward 
Reverse 


GCGGATCCCATATG-GACACAAAAGAAATCCTC 
CCCGCTCGAG- TAATGGGAAACCTTGTTTT 


BamHI-Ndel 
Xhol 


ORFll 


Forward 
Reverse 


GCGGATCCCATATG-GCGGTCAACCTCTACG 
CCCGCTCGAG-GGAAACGACTTCGCC 


BamHI-Ndel 
Xhol 




Forward 
Reverse 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC 
CCCGCTCGAG-AGGGTGTGTGATAATAAG 


JOoinril-lN llcl 

Xhol 


ORF15 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG 

CGGGATCC-TGCGGGACACTGACAGG 

CCCGCTCGAG-AGGTTGGCCTTGTCTATG 


Ndel-NcoI 

BamHI 

Xhol 


ORF17 


Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG 


Ndel-NcoI 



wo 99/24578 



-488- 



PCT/IB98/01665 





Forward 
Reverse 


CGGGATCC-ATTGCCGGCCTGTTCG 
CCCGCTCGAG-AAGCAGGTTGTACAGC 


BamHI 
Xhol 


ORF18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT 
CCCGCTCGAG-TCTTCCAATTTCTGAAAGC 


BamHI-Ndel 
Xhol 


ORF19 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC 

CGGGATCC-TTCGCCAGTGTTTTTACCG 

CCCGCTCGAG-GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 20 


Forward 
Forward 
Reverse 


GGAAT TCC AT ATGGCCATGG - TCGGCGCGGGT ATG 

CGGGATCC-TTCGGCGCGGGTATG 

CCCGCTCGAG-CGGCGAGCGAGAGCA 


Ndel-NcoI 

BamHI 

Xhol 


ORF 22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGATTAAAATCAAAAAAGGTCT 

CGGGATCC-ATGATTAA7UVTCAAAAAAGGTCTAAACC 

CCCGCTCGAG-ATTATGATAGCGGCCC 


Ndel-NcoI 

BamHI 

Xhol 


ORF 23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCCGCTCGAG-TTTAAACCGATAGGTAAACG 


BamHI-Ndel 
Xhol 


ORF 24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG 
CGGGATCC-ATGATGCCGGAAATGGTG 
CCCGCTCGAG- TGTCAGCGTGGCGCA 


Ndel-NcoI 

BamHI 

Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGAG-ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG -CAGCTGATCGACTATTC 
CCCGCTCGAG-GACATCGGCGCGTTTT 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTTTA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG-GGGTTCGATTAAATAACCAT 


Ndel-NcoI 

BamHI 

Xhol 


ORF 28 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-ACGGCTGTACGTTGATGT 
CGGGATCC-AACGGCTGTACGTTGATG 
CCCGCTCGAG- TTTGTCAGAGGAATTCGCG 


Ndel-NcoI 

BamHI 

Xhol 


ORF 29 


Forward 
Forward 
Reverse 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT 

CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT 

CCCGCTCGAG-AAACAGCCATTTGAGCGA 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT 
CCCGCTCGAG-ATAGCCCGCTTTCAGG 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT 
CCCGCTCGAG-AGCATTGTCCAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCGGCAGACA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG 
CCCGCTCGAG-TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG-GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG- T TCATCTTTTTCATGT TCG 


Bamm-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GATCAGCTAGCCATATG-AAACAGAAAAAAACCGC 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG-GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG-GCAGAGATCTGTTTG 
CCCGCTCGAG-GTTTGCCGATCCGACCA 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA 

CGGGATCC-GCCATACCTTCTTATCAGAG 

CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC 


Ndel-NcoI 

BamHI 

Xhol 


ORF 97 


Forward 


GCGGATCCCATATG-CATCCTGCCAGCGAAC 


BamHI-Ndel 
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Reverse 


CCCGCTCGAG-TTCGCCTACGGTTTTTTG 


Xhol 


ORF98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC 


BamHI-Ndel 
Xhol 


ORFIOO 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


BamHI-Ndel 
Xhol 


ORFlOl 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


BamHI-Ndel 
Xhol 


ORF102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


BamHI-Ndel 
Xhol 


ORF103 


Forward 
Reverse 


GCGGATCCCATATG-AACCACGACATCAC 
CCCGCTCGAG-CAGCCACAGGACGGC 


BamHI-Ndel 
Xhol 


ORF104 


Forward 
Reverse 


GCGGATCCCATATG- ACGTGGGG7WVCGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


BamHI-Ndel 
Xhol 


ORF105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


BamHI-Ndel 
Xhol 


ORF106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


BamHI-Ndel 
Xhol 


ORF109 


Forward 
Reverse 


GCGGATCCCATATG-GAAGATTTATATATAATACTCG 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


BamHI-Ndel 
Xhol 


ORFllO 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
AAACTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 


ORFlll 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA 
AAACTGCAG-TCTGCGCGT TTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGJU^TGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AT^GAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 
AAAAAAGTCGAC-CTATTTTTTAGGGGC TTTTGC ITGTTTGAAAAGCCTGCC 


EcoRI 
SaU 


ORF119 


Forward 
Reverse 


AAAGAATTC-TACAACATGTATCAGGAAAACCAATACCG 
AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC 


EcoRI 
Psfl 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


Forward 
Reverse 


AAAAAAGTCGAC- ATGTC rTACCGOGCAAGCAGTTC TCC 
AAACTGCAG-TCAGGAACACAAACGATGACGAATATCCGTATC 


Sail 
PstI 


ORF125 


Forward 
Reverse 


TVAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
Psfl 


ORF126 


Forward 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


AAAGAATTC-ATGACTGATAATCGGGGGTTTACG 
TiAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
SaU 


ORF128 


Forward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC 
AAACTGCAG-CTAITGCAATGCGCCGCCGCGGGAATGITTGAGCAGGCG 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 
AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCATTCTCGGTGCG 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


EcoRI 
PstI 


ORF131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT 
CCCGCTCGAG-CCAGCGGACGCGTTC 


BamHI-Ndel 
Xhol 


ORF 132 


Forward 
Reverse 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG 
CCCGCTCGAG- CCAATCTGCCAGCCGT 


BamHI-Ndel 
Xhol 


ORF 133 


Forward 
Reverse 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG 
CCCGCTCGAG-AAACTTGTAGCTCATCGT 


BamHI-Ndel 
Xhol 


ORF 134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG 
CCCGCTCGAG-ATCCTGTGCCAATGCG 


BamHI-Ndel 
Xhol 


ORF 135 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAAAAGCTTT 
CCCGCTCGAG-AAATACCGCTGAGGATG 


BamHI-Ndel 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC 
CCCGCTCGAG-TTCCGAATATTTGGAACTTTT 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA 
CCCGCTCGAG-ATAACGGTATGCCGCC 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 
CCCGCTCGAG- CGGCGTTTT ATAGCGG 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 
CCCGCTCGAG-TAACGTTTCCGTGCGTTT 


BamHI-Ndel 
Xhol 
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ORF140 


Forward 


GCGGATCCCATATG-TTGCCCACAGGCAGC 


BamHI-Ndel 




Reverse 




AllOl 


ORF141 


Forward 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT 


BamHI-Ndel 




Reverse 


CCCnCTCCZACZ—ATCVCZTTCZTT^TTAAAATATT 


Anoi 


ORF142 


Forward 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG 


BamHI-Ndel 




Reverse 




AnOl 


ORF143 


Forward 


GCGGATCCCATATG-GATACCGCTTTGAACCT 


BamHI-Ndel 




Reverse 




Xnol 




rorwarG 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 


RamHT-NdeT 

LJ €11 111 1.1 1 ^ U-t/A 




Reverse 


CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


Xhol 


ORF147 


Forward 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 1 10-130, where the ORF itself carries an EcoRl site {eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
5 ORFs 115 and 127), a Sail site was used in the reverse primer. 
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TABLE n - Summary of cloning, express! n and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


4- 


4- 


His-fusion 


orf2 


+ 


4- 




GST-fusion 


orf 2.1 


4- 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


4- 


4- 


His-fusion 


orf 5 
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CLAIMS 

1 . A protein comprising an anuno acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8. 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66. 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92. 94, 96, 98, 100, 102, 

10 104, 106, 108, 110, 1 12, 1 14, 1 16, 118, 120. 122. 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160. 162, 164. 166, 168. 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 

15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334. 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484. 486, 488, 490. 492, 494, 496. 498. 500, 502, 

20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580. 582. 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614. 616. 618. 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658. 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 

25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760. 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842. 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

30 5. A protein having 50% or greater sequence identity to a protein according to claim 4. 



wo 99/24578 PCT/IB98A)I665 

-496- 

6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2. 4. 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62. 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330. 332, 334, 

10 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

15 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742. 744, 746, 748, 750, 752, 754, 756, 758. 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one ofclaims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71. 73, 75, 77, 79, 81, 83, 85, 87, 
89,91,93,95,97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145. 147. 149, 151, 153, 155, 157, 159. 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
21 1, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291, 293, 295, 297, 299, 301, 303. 305, 307, 309, 311, 313, 315, 317, 319. 321, 323, 325, 327. 329, 
331, 333. 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359. 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379. 381. 383. 385, 387, 389, 391, 393. 395, 397, 399, 401, 403, 405, 407, 409, 
41 1, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 

5 451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493. 495. 497. 499, 501. 503, 505. 507, 509, 511, 513. 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597. 599, 601. 603, 605, 607, 609, 
611, 613, 615, 617, 619, 621. 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 

10 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739. 741, 743, 745, 747, 749, 751, 753. 755, 757, 759, 761, 763, 765. 767, 769, 
771. 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 
811, 813, 815, 817, 819, 821, 823, 825. 827. 829, 831, 833. 835, 837, 839, 841, 843, 845, 847, 849, 

15 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883. 885. 887, 889, 
&891.. 

10. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5. 7, 9. 11, 13. 15, 17, 19, 21, 23, 25, 27, 29, 31. 33, 35. 37, 39, 
41, 43, 45, 47, 49, 51. 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89. 91, 

20 93,95,97,99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119. 121. 123, 125, 127. 129, 131, 133, 
135, 137. 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183. 185. 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213. 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 

25 295. 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319. 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379. 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421. 423, 425, 427, 429, 431, 433. 435. 437, 439, 441, 443, 445, 447, 449, 451. 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 

30 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589. 591, 593, 595, 597, 599, 601, 603, 605. 607, 609, 611, 613, 
615, 617. 619, 621, 623, 625. 627. 629. 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669. 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 

35 695, 697, 699, 701, 703, 705, 707, 709, 71 1, 713, 715, 717, 719, 721, 723. 725. 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

1 2. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
10 one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 14 or claim 15 for use as a pharmaceutical. 

1 7. The use of a composition according to claim 14 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria 
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